A neural network for learning the meaning of objects and words from a featural representation

Accepted Manuscript A neural network for learning the meaning of objects and words from a featural representation Mauro Ursino, Cristiano Cuppini, Eli...

Download PDF

3MB Sizes 0 Downloads 43 Views

Report

Full Text

Accepted Manuscript A neural network for learning the meaning of objects and words from a featural representation Mauro Ursino, Cristiano Cuppini, Elisa Magosso PII: DOI: Reference:

S0893-6080(14)00263-9 http://dx.doi.org/10.1016/j.neunet.2014.11.009 NN 3422

To appear in:

Neural Networks

Received date: 18 July 2013 Revised date: 21 November 2014 Accepted date: 25 November 2014 Please cite this article as: Ursino, M., Cuppini, C., & Magosso, E. A neural network for learning the meaning of objects and words from a featural representation. Neural Networks (2014), http://dx.doi.org/10.1016/j.neunet.2014.11.009 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

*Manuscript Click here to view linked References

A Neural Networ k for L earning the M eaning of O bjects and Words from a F eatural Representation

M auro U rsino, C ristiano C uppini, E lisa M agosso Department of Electrical, Electronic and Information Engineering University of Bologna, Bologna, Italy

Running title: Learning the meaning of objects and words Address for correspondence: Mauro Ursino Full Professor of Biomedical Engineering Department of Electrical, Electronic and Information Engineering Viale Risorgimento 2, I 40136 Bologna Email: [email protected]

1

A BST R A C T The present work investigates how complex semantics can be extracted from the statistics of input features, using an attractor neural network. The study is focused on how feature dominance and feature distinctiveness can be naturally coded using Hebbian training, and how similarity among objects can be managed. The model includes a lexical network (which represents wordforms) and a semantic network composed of several areas: each area is topologically organized (similarity) and codes for a different feature. Synapses in the model are created using Hebb rules with different values for pre-synaptic and post-synaptic thresholds, producing patterns of asymmetrical synapses. This work uses a simple taxonomy of schematic objects (i.e., a vector of features), with shared features (to realize categories) and distinctive features (to have individual members) with different frequency of occurrence. The trained network can solve simple object recognition tasks and object naming tasks by maintaining a distinction between categories and their members, and providing a different role for dominant features vs. marginal features. Marginal features are not evoked in memory when thinking of objects, but they facilitate the reconstruction of objects when provided as input. Finally, the topological organization of features allows the recognition of objects with some modified features.

K ey words: Semantic memory, lexical memory, conceptual representation, Hebb rule, dominant features, category formation

2

INTRODUC TION The manner in which concepts are organized and stored in our long-term memory is crucial for many processes in human cognition, including object recognition, action planning, language, episodic memory and the formation of abstract ideas. The organization of conceptual information is not merely a theoretical issue, but directly affects behavior, since it governs the modality through which the external world determines our concepts, and the manner through which concepts influence action by transmitting information among connected brain regions. For this reason, the study of semantic memory via theoretical models (either qualitative or mathematical) plays an important role in modern cognitive neuroscience (see, among the others, Cappa, 2008, Gibbs, 2003, Hart et al., 2007, Kiefer and Pulvermüller, 2012, Pulvermüller and Fadiga, 2010, Tyler and Moss, 2001). Several questions on the functioning of semantic memory deserve particular attention, and are still not completely understood. How are categories formed from our sensory-motor experience to distinguish between superordinate concepts, i.e., categories, and subordinate concepts, i.e., members of a given category? What is the role of different attributes of an object in recognition, given that not all attributes are equally important? How can a word-form evoke its semantic content, which aspects of semantic information are necessary for lexical activation, and which are unnecessary? Moreover, how can we explain the different response times (e.g. in object naming tasks) reported in the literature? How can a model deal with the similarity among concepts? Finally, how can a model explain semantic deficits observed in neurological patients with brain damage? A high perspective hypothesis on these questions is that our concepts are represented in memory as a collection of features condensing the sum of our previous sensory, motor and emotional experiences (Smith et al., 1974; Cree, McNorgan, and McRae, 2006; McRae, de Sa, and Seidenberg, 1997; Rosch et al., 1976). This idea is exploited in several connectionist models, assuming that information on different features is connected and mutually interrelated via a network 3

of processing units, so that activity in one part of this network spreads to the other parts. Within this generic framework, many controversial issues still surround the role of the different features, the true nature of their neural organization, and the computational structure allowing our experience to be translated into concepts and the way concepts affect behavior. Two main theoretical issues predominate in a feature-based connectionist model: i) the role of the different features concurring to form concepts: not all features have the same importance in object recognition, nor all are spontaneously evoked when thinking of a concept; ii) the connections among these features, and how these connections are induced by experience. The aim of the present work was to improve two major aspects of our previous model of semantic memory (Ursino, Magosso, & Cuppini, 2009; Ursino, Cuppini, and Magosso, 2010; Ursino, Cuppini, and Magosso, 2011): the significance of features, assuming that features can have a different role in object storage and recovery, and the similarity among features, demonstrating that an object can be recognized even if some of its features are changed compared with those used during training. The paper is structured as follows. Primarily, the two theoretical issues outlined above (i.e., the characteristics of semantic memory features, and the structure of trained connections) will be analyzed with reference to the current state of the art. We will first examine the main common descriptors of semantic features according to the literature, providing the necessary definitions. Then, we will review the main connectionist models used in past years, briefly summarising their virtues and limitations. Subsequently, our model will be described in detail focusing on its new aspects. Results show that the model can simulate many aspects of semantic memory, not previously incorporated. Namely it can discriminate among dominant and marginal features and assess their role in object naming and word recognition tasks; it leads to the formation of categories with superordinate and subordinate concepts; it can exploit similarity to recognize concepts; and it provides several predictions on time responses.

4

M ethodological aspects

The main feature descriptors and their role within a model As discussed in a recent paper by Montefinese et al. (2013, page 355), the idea that concepts are

! merely binary in type (i.e., ! ! A first important descriptor of a feature is its distinctiveness. We say that a feature is distinctive if it is important to distinguish a given concept from the others. Of course, distinctiveness is a gra

! ! shared if it belongs to more than one concept simultaneously. A common idea is that shared properties ! !

recognized (Barsalou, 2008). ! semantic memory (Sartori and Lombardi, 2004; Montefinese et al., 2013). Conversely, other ! ept. Local properties are used to quantify how important a feature is for the given concept, for instance whether a feature spontaneously springs to mind when thinking about a given concept, or whether it is essential to recognize a concept. A common measure of the importance of a feature is its dominance, which quantifies the feature production frequency: it indicates the percentage of participants who list a given feature for a concept in a feature listing task (McRae et al., 1994; Sartori and Lombardi, 2004). More recently, a variation of the same idea, named accessibility, was proposed by Montefinese et al. (2013). Accessibility accounts not only for the production frequency (like dominance) but also for the order in which features are produced in the feature listing task. However, it should be emphasized that 5

accessibility or dominance are not the same as importance: a feature may be highly accessible since it is important for the concept, but a feature may also be accessible (regardless of its importance) as ! At the other extreme of the same dimension a feature that does not emerge at all (or just rarely emerges) can be said to be marginal for the given concept. At this point we have two main dimensions to quantify the role of features in a semantic structure: a global dimension, distinctiveness vs sharedness, and a local dimension, dominance (or accessibility) vs. marginality. Some authors introduced a non-linear combination of distinctiveness and dominance to define a more comprehensive dimension, named relevance (Sartori and Lombardi, 2004) (similarly, a combination of distinctiveness and accessibility was named significance by Montefinese et al., 2013). A similar (but not equivalent) distinction among features was proposed by Rips, Schoben and

" # distinguished between "characteristic # n all of them, and "defining # necessarily important for the concept. Moreover, the number of defining features contained in a concept decreases as the concept becomes more and more general. Although very useful, the previous definitions can be problematic in a feature-based model where it is crucial to have a clear understanding of whether a property is an input (or independent quantity; i.e., it is a property of the environment) or an output (or dependent-quantity; i.e., it is an emergent property of the model). To understand this major aspect, let us consider the process of feature learning in the model. In the present paper, we train the model by giving some features as input, together with the corresponding word-form, and associate them via Hebbian learning. Subsequently, we test which features are essential to evoke a word-form, and which features are spontaneously evoked in response to a given word. In this scenario, distinctiveness (or sharedness) is an input or independent 6

and this property is used as an input to our model). Conversely, dominance (or accessibility, or characteristic/defining features) is an output property, since it represents which features and in which order are evoked by the network after training through its connectivity, or how a feature can evoke a concept. In this regard, this work will consider an emergent dominance to indicate the likelihood that a feature spontaneously pops out when we recognize an object, i.e., it is evoked by the model in response to a word-form or in response to a partial semantic cue. At this point a further question arises: which characteristics of the environment (i.e., which input properties) determine the emergent dominance? This question requires the introduction of a few additional input dimensions of a feature. First, a feature may occur during our experience of an object with a different frequency. It is expected that a feature occurring more frequently is stored more strongly in the semantic memory, hence it acquires a greater dominance. Indeed, input dominance can be considered the likelihood that a property is present in the environment. Second, a feature may have a great emotional impact burns functional utility coat is used to war m you up all these characteristics of a feature may affect its emergent dominance. Finally, a further input property of features should be mentioned, i.e., cor relation. We say that two features are highly correlated if they tend to occur frequently together. It is worth noting that correlation is different from frequency. For instance, two features may occur just in 30% of cases within a given concept, but may always occur together. In this case, they would have a 30% frequency but a 100% mutual correlation. Correlation might improve the robustness of the model (correlated features tend to be evoked together) and may also affect the member category relationship.

7

A list of the previous dimensions for a feature, with a distinction on whether they should be considered inputs to the network, or emergent properties, together with some behavioral effects are summarized in Table 1. For the sake of simplicity, the present study used just two properties as inputs: distinctiveness and frequency of occurrence. The effect of the other input properties (correlation, emotional impact and functional utility) are not incorporated in the model, but may be assessed in future works.

Connectionist models One of the main purposes of a model is to analyze how the output properties and behaviors listed in Table 1 (dominance, accessibility, reaction times) can spontaneously emerge as a consequence of the input characteristics, using suitable learning rules for connections. Several models have been proposed in the past to deal with these problems. One of the first connec ! memory, such as pattern completion, generalization and inheritance. The main limitation of that work, however, was that weights were set a priori, and were not based on past experience. In a subsequent paper, Hinton (1986) trained the network with the backpropagation algorithm, and showed that it can disclose implicit semantic features not originally used as input, and represent them in the hidden layer. This work was subsequently expanded by Rumelhart et al. (Rumelhart, 1990¸ Rumelhart and Todd, 1993) and Rogers and McClelland (Rogers et al., 2004; Rogers and McClelland 2008) who used a feedforward schema (not an attractor network) with the main objective to investigate which representation can develop in the hidden layers. The Rumelhart model was further extended by Rogers and McClelland (2004) who developed a model in which perceptual, linguistic and motor representations communicate via a heteromodal region.

8

These models were based on a feedforward schema trained with backpropagation. Although very powerful in learning relations among features, feedforward networks are not well suited for studying the temporal aspects of our memory, since information propagates among layers in a single step. For this reason, several m model), i.e., they used recurrent synapses among units in an effort to better understand the dynamic aspects of semantic memory (such as priming, or response times). Plaut et al. (Plaut and Shallice, 1993; Plaut, 1995) analyzed semantic aphasia and associative priming; Cree et al. and McRae et al. (McRae, 2004; McRae et al., 2005; Cree et al., 2006) investigated the statistical relations among features (derived from feature verification tasks). Specifically, Cree et al. (2006) found that distinctive features play a greater role in activating a concept than shared features, since stronger synapses are formed between word units of a concept and the distinctive features of the same concept. O'Connor et al. (2009) demonstrated that a single layer of feature nodes can represent both subordinate and superordinate concepts, without the need for any a priori hierarchical organization. The previous models share several aspects with the present one, such as the use of attractor networks and the different role given to the different features. However, these models trained synapses using the recurrent backpropagation through time algorithm, which is physiologically implausible. Recurrent networks trained with Hebb rules have a long tradition in neural networks (Hopfield, 1984; Amit et al., 1989). McRrae et al. (1997) used an attractor network trained with the Hebb rule to investigate the role of feature correlation in the organization of semantic memory and explained several semantic priming effects. Kawamoto (1993) demonstrated that attractor basins are crucial to understand phenomena such as priming, deep dyslexia and ambiguity resolution. Miikkulainen (Miikkulainen 1993; Miikkulainen 1997) developed a model consisting of two self-organizing maps (one for lexical symbols and the other for word meaning) and associative connections between them based on Hebbian learning. A similar model was used by Silberman et al. (2007) to study

9

associations among words. Siekmeier and Hoffman (2002) used the Hebb rule in a Hopfield network to compare semantic priming in normal subjects and schizophrenic patients. According to this brief analysis, models using backpropagation training rules can learn sophisticated relationships among features, which can simulate many behaviors of the semantic memory. However, this rule is not physiological. Conversely, the Hebb rule is much more physiological, but leads to the formation of symmetrical synapses (as in the original Hopfield model). As investigated further on in this work, a distinction between shared vs. distinctive features and between dominant vs. marginal features is essential to simulate many aspects of semantic memory. In particular, shared features are fundamental for category formation, while distinctive features allow individual members to be recognized. Furthermore, only dominant features spontaneously bring to mind when we are thinking to an object. Learning this distinction directly from data requires that synapses become asymmetrical. In fact, a marginal feature stomach o should be able to evoke a dominant feature ( hence, it should send a strong synapse, but not vice versa. This complies with a principle of parsimony, avoiding less important features being evoked spontaneously (we do not think to the stomach of a ruminant, habitually) occur, to avoid a category evoking all its members. Asymmetric synapses can be obtained using Hebb rules with a different threshold for presynaptic and postsynaptic neurons (as in the covariance rule, or in the BCM rule, see Dayan and Abbott, 2001). However, to our knowledge, a thorough analysis of this problem in the domain of semantic memory is still lacking. The present paper investigates the use of Hebb rules with different thresholds, both in terms of synapses linking features, and synapses linking features and word-forms.

10

2 M O D E L D ESC R IP T I O N The model incorporates two neural networks, as illustrated in the schematic diagram in Fig. 1. # $ collection of sensory-motor features. These are assumed to spread along different cortical areas (both in the sensory and motor cortex and perhaps also in other areas, for instance, emotional) and are organized topologically according to a similarity principle. A second chain of neurons #! $ word form, and is associated with an individual object representation. However, the semantic and the lexical nets become strongly interconnected after learning, hence they work in a strongly integrative way to constitute a single highly interactive lexical-semantic system.

2.1 The semantic and lexical networks We adopt an exemplary semantic network with nine feature areas. Each area is described through a 20x20 square of neural units (400 total units); for computational simplicity, these nine areas are coded through a single semantic matrix, thus consisting of 60x60 units ( i " j " 60). However, we imposed no communication at the border of two different areas. In this work the lexical network comprises a single chain with 8 units (i "8). We use the superscripts S and L to denote a quantity belonging to the semantic or lexical network, respectively. A quantity connecting the two networks is represented with two superscripts, the first denoting the recipient region, the second the donor region. Each neural unit in the semantic net is denoted with two subscripts ( ij or hk) representing its position within the network. Consequently, a synapse between two neural units in the semantic net has four subscripts (for instance, ij,hk) the first two representing the position of the post-synaptic neuron, the second the position of the pre-synaptic neuron. Each neural unit in the lexical net has just one subscript, and synapses linking semantic elements with lexical neural unit are denoted by three subscripts: one ( I or h) for the lexical unit and two ( ij or hk) for the semantic element. 11

The representation of an object in the semantic network is distributed over the nine areas. However, we assumed that the representation of a single feature is localist, i.e., a feature is coded by the position of a neuron within its area. Nevertheless, thanks to Mexican-hat lateral synapses, the activation of a feature always simultaneously triggers the activation of proximal neurons, i.e., we !

" simplicity, this work always assumed that just one bubble is active within each area of the semantic

condition with multimodal features (e.g., two bubbles in the same area, two colors in the same color map). Conversely, object representation in the lexical net is purely localist: just one neuron codes for a word-form.

Mathematical equations Each unit in the semantic and lexical networks receives a global input (denoted by u) and produces an output activity (denoted by x). This relationship is described through a first order lowpass filter with time constant (Eq. 1) which reproduces the temporal pattern of the response, and a sigmoidal function (Eq. 2) which accounts for a lower threshold and upper saturation of neurons. Hence

A

d A xij (t ) xijA (t ) H A uijA (t ) dt

A S, L

(1)

where the superscript A denotes the network (semantic or lexical), A is the time constant, which

determines the speed of the answer to the stimulus, and H A u A t is a sigmoidal activation function. In Eq. (1) we used two subscripts ( ij ) to denote the position of the unit, as in the semantic network. A similar equation but with a single subscript is used in the lexical network. The sigmoidal relationship is described by the following equation

12

H A u A t

1 e u

1 A

t A p A

;

(2)

where p is a parameter which sets the central slop It follows from Eq. 2 that the upper saturation is 1 (i.e., all neural activations are normalized to the maximum). The input u in Eq. 1 has different expressions in the semantic and lexical networks.

Semantic network The input to neurons in the semantic net (superscript A = S) is computed as the sum of four contributions SL u ijS t I ijS t E ijSS t LSS ij t C ij t

(3)

I ijS represents the external input for the neural unit in position ij , coming from the sensory-motoremotional processing chain which extracts features. I ijS may assume value 0 (absence of feature) or 1 (presence of feature). E ijSS represents an excitatory coupling term coming from units in other areas of the semantic network (i.e., from neurons coding for a different feature, see Eq. 4), LSS ij represents the input from lateral connections coming from units in the same feature area (Eqs. 5-9) while C ijS L is a cross-network term, describing the connections coming from the lexical area (Eq. 10). These terms are implemented using different sets of weighted connections. The coupling terms within the semantic network have the following expressions S E ijSS WijSS ,hk xhk

(4)

SS , EX SS , IN S LSS ij Lij ,hk Lij ,hk xhk

(5)

h ,k

h ,k

where ij denotes the position of the postsynaptic (target) neuron, and hk the position of the presynaptic neuron, and the sums extend to all presynaptic neurons in the semantic network. 13

The symbol WijSS , hk represents excitatory inter-area synapses, which form an auto-associative network within the semantics. These synapses are subject to learning (see below); however, we assume they are always equal to zero within the sa me area (i.e., only different features, coded in different areas, can be linked together by experience). We found the use of inhibitory synapses within the semantic area unnecessary; features that never occur together are simply not connected. , EX , IN and LSS represent lateral excitatory and inhibitory synapses among The symbols LSS ij , hk ij , hk

neurons in the same area, and are used to implement a similarity principle. Accordingly, all terms , EX , IN LSS and LSS ij , hk ij , hk with neurons ij and hk belonging to different areas are set to zero. The lateral

synapses assume a Mexican-hat disposition, i.e., an excitatory zone between proximal units surrounded by an inhibitory annulus among more distal units. Hence, SS , EX ij , hk

L

SS , IN ij , hk

L

( d )

LEX e ij ,h k 0

0 ( d )

LIN e ij ,h k 0

0

2

2

2 /( 2 ex )

/( 2 in2 )

if ij and hk are in the same area otherwise if ij and hk are in the same area otherwise

(6)

(7)

where L0EX and L0IN are constant parameters, which establish the strength of lateral (excitatory and inhibitory) synapses, ex and in determine the width of the excitatory and inhibitory portions in the Mexican hat, and d ij ,hk represents the distance between the units at position ij and hk. To avoid boundary effects (i.e., that neurons close to the border do not receive enough synapses from the others) distances were calculated mimicking a spherical structure for each area. This is obtained using the following expression for the distance (where N = 20 denotes the side of each square area):

d ij ,hk i2,h 2j , k

(8)

where l m

if N l m

l ,m

l m N/ 2 otherwise

(9) 14

Finally, the cross-network term C ijS L coming from the lexical area is calculated as follows

C ijS L WijS,hL xhL

(10)

h

where xhL represents the activity of the neuron h in the lexical area and the symbols WijS,hL are the synapses from the lexical to the semantic network (which are subject to Hebbian learning, see below).

Lexical network The input to the lexical neural unit in the i -position ( uiL in Eq. 1) includes just two terms since the present model does not consider lateral links among lexical units. Hence

uiL t I iL t C iLS t

(11)

I iL t is the input produced by an external linguistic stimulation, coming either from speech listening or phonemes reading. I iL t assumes value 1 when the word-form is given to the network and zero otherwise.

C iLS ( t ) represents the intensity of the input due to synaptic connections from

the semantic network. Synapses from the semantic to the lexical network include both an excitatory term and an inhibitory term (denoted by Wi ,LhkS and Vi ,LhkS , respectively) which are trained in different ways. This involves a more complex inhibitory-excitatory strategy. In fact, a word-form in the lexical area must be excited by the dominant features present in the semantic scenario, but inhibited when the semantic network includes an active feature not belonging to the object (i.e., we ask that the model does not recognize a , since they were never met during the previous training phase!). In other words, features not involved in the representation of the object inhibit the corresponding lexical unit. Hence, we can write S S C iLS Wi ,LhkS xhk Vi ,LhkS xhk

h

k

h

(12)

k

15

where xhSk represents the activity of the neuron hk in the semantic net (see Eq. 1 Wi ,LhkS is the strength of excitatory synapses and Vi ,LhkS the strength of inhibitory synapses.

Parameter assignment The parameters and their numerical values are listed in Table 2. These values were given as follows. The time constant of semantic and lexical units (S and L) have been given a value (a few milliseconds) normally used in deterministic mean-field equations (BenYishai, Bar, & Sompolinsky, 1995). In particular, these values can be chosen significantly smaller than the membrane time constant (Treves, 1993). The central abscissas, S and L, have been assigned to have negligible neuron activity when the input is zero. The slopes of the sigmoidal relationships, pS, and pL, have been assigned a high value to have a rapid transition from silence to saturation. This allows a clearer presentation of the results, with units that signal the presence or absence of the given attribute. Parameters establishing the topological organization of the semantic area (i.e, the lateral excitation and lateral inhibition in the Mexican hat, L0EX , L0IN , ex, in) have been given the same values as in previous works (Ursino et al., 2010, Ursino et al., 2011). These values surrounding neurons, with a topological distance as large as 1-2 units. However, these parameters may be varied to simulate a different attention to similarity (see Results section).

2.2 Training the network

Model equations At the beginning of training all excitatory inter-area synapses in the semantic net, and all excitatory and inhibitory synapses between the semantic and the lexical nets are set at zero. Training is then divided into two distinct phases. I) During the first, individual objects (described by part of their features according to a given statistics) are presented to the network one by one, and inter-area synapses linking the different features are learned. Features have a different frequency of occurrence. II) In the second training phase, word-forms are given to the lexical network, together 16

with part of their features in the semantic network, with the same statistics as in phase I, and the synapses linking the lexical and the semantic networks are learned. This phase included not only the word-forms denoting individual members of a category (with some of the distinctive features + some shared features), but also word-forms denoting categories (with the shared attributes of that category). All synapses are trained with a Hebbian rule modifying their weight on the basis of the correlation (or anticorrelation) between the presynaptic and postsynaptic activity. To account not only for long-term potentiation but also for long-term depression (see Dayan & Abbott, 2001) we assume that these activities are compared with a threshold. In particular, we used the following Hebbian rule for all excitatory synapses: they are reinforced when both the pre-synaptic and postsynaptic activities are above threshold while they are reduced when the activities are negatively correlated. Hence, using the meaning of symbols explained above, this equation holds:

AB A AB B AB WijAB , kh ij , hk xij post xhk pre

(13 )

where the superscripts AB can assume the meanings SS, S L or L S depending on the particular synapse, WijAB , kh represents the change in the synapse strength, due to the given presynaptic and AB and pABre are thresholds for the postsynaptic and presynaptic activities, post-synaptic activities, post

AB denotes a learning factor, xijA is the activity of the neural unit at position ij in the post-synaptic area, and xhBk is the activity of the neural unit at position hk in the pre-synaptic area used two subscripts both for the presynaptic and the post-synaptic neurons, which is the more general case, although a single subscript is sufficient for neurons in the lexical net). Similarly, we used the following anti-Hebbian rule for the inhibitory synapses: these are weakened when both the pre-synaptic and post-synaptic activities are above threshold, and are strengthened when the activities are negatively correlated. Hence S LS LS xiL post xhk pre Vi ,LhkS iL,hk

(1 ) 17

where we use the symbols and for the threshold and the learning rate, to avoid confusion with those used for the excitatory synapses. The previous rules, however, need some adjustments to be more physiological. First, when both presynaptic and postsynaptic activities are low, no weight change should occur. Hence (let us consider an excitatory synapse) AB B AB if x ijA post and x hk pre then WijAB , hk 0

(14)

Second, synapses cannot increase indefinitely, but must reach a maximum saturation level. This is achieved in our model by progressively reducing the learning rate when synapses approach their AB ). We have maximum (denoted by Wmax

ijAB,hk

0AB AB Wmax WijAB , hk AB Wmax

(15)

where 0 is the learning rate when the synapse is zero. Finally, a synapse cannot become negative (otherwise excitation would be converted into inhibition, which is not physiologically acceptable). Hence, when computing the new synapse value we have: AB WijAB WijAB ,hk Max

,hk Wij ,hk ,0

(16)

ssigned to the left-hand member. Eqs. 13-16 have been applied in final steady state conditions (i.e., 200 ms after the input presentation) to avoid any effect of the transient changes in activity. Of course, similar rules as in Eqs. (14)-(16) hold for the inhibitory synapses, but are not reported for brevity. A further aspect has been included regarding the excitatory synapses from the semantic to the lexical net. In fact, a word-form may be associated with a different number of dominant features (some words may have three dominant features, other may have five dominant features, and so on). The question is: how many features must be evoked in the semantic net, so that the concept is recognized and the corresponding word-form evoked in the lexical layer? If the Hebb rule with 18

upper saturation as in Eq. (15) and (16) were used a puzzling result would emerge after prolonged training: the same number of features would be required to recognize a concept, independently of the number of dominant features. We do not think this is correct. Hence, we introduced a strong assumption that requires further validation: during an object naming task, when the lexical net does not receive any external input, approximately the same percentage of dominant features is required to recognize an object, independently of the overall number of dominant features. To reach this objective, we introduced a normalization to the synapses entering a word-form. Hence, the following rule is used to update synapses WLS i LS MaxWi ,LhkS Wi ,LhkS ,0 h ,k

MaxWi ,LhkS Wi ,LhkS ,0 LS LS LS MaxWi ,hk Wi ,hk ,0 max / i

if i LS max if i LS max

Wi ,LhkS

(17)

where the symbol i LS denotes the sum of all excitatory synapses entering the word-form at position

i (hence, the sum is extended to all neural units hk in the semantic net). Eqs. (17) substitutes Eqs. 15 and 16 in the semantic net. According to Eq. (17), the sum of the synapses entering a given word max. The choice of the upper saturation max determines the percentage of features required to evoke the word-form. To clarify this point, let us consider the sigmoidal activation function of a wordform depicted in Fig. 2, and let us name usat the input level necessary to excite the neuron close to saturation (in our model usat ~ 1). Assuming max 2usa t signifies that about half of the dominant features are required to evoke a word-form, max usa t signifies that all the dominant features are required to excite the word-form to its up level. This work adopts the assumption that all dominant features must be evoked in the semantic net to excite a word-form. Hence we chose max usa t . However, we stress that other less stringent options can be implemented, with a different choice of parameter max . Hence, the model is flexible enough to adapt to different scenarios. 19

Parameter assignment learning the semantic net A fundamental point to achieve a correct training process concerns the choice of appropriate values for the presynaptic and post-synaptic thresholds in the Hebb rule. Let us start with the semantic net. In our opinion, a valid semantic network should meet the following requirements: Dominance vs. non-dominance - a) Dominant features must be evoked by all other features of the object (either dominant or non-dominant), hence they should receive strong input synapses. Conversely, they should send strong output synapses only to the other dominant features. b) Nondominant features should not be evoked by the other features of the object (i.e., they receive weak input synapses) but favor the reconstruction of the object (hence, they send strong output synapses toward dominant features). Shared vs. distinctive - c) Features shared by different objects in a category should recall all the other shared dominant features in the category, but they should not recall distinctive features of ! "

!" ! " ! "!"

! "! "

should recall not only other dominant distinctive features of the same object but also the dominant ! " ! has a tail"! " the block diagram of Fig. 3. This particular behavior of semantic synapses can be achieved assuming that the threshold for postSS 0.5 , i.e., at the mid between maximum inhibition synaptic activity is quite high (we assume, post

and maximum excitation) whereas the threshold for the pre-synaptic activity is low (we assume SS pre 0.05 , close to inhibition. A value just slightly above zero has been chosen here to avoid a

20

residual neuronal activity, induced by connections in the semantic net, causing an undesired synapse reinforcement). Table 3 summarizes the different conditions, where [xpost xpre] denote the presynaptic and post synaptic activities. The situation in the second line ([0 1]) occurs at the synapses leaving a shared feature toward a distinctive feature, when the shared feature appears in a synapses leaving a frequent feature toward a non-frequent feature, when the first is perceived and the second is not. Hence, after sufficient training, due to the statistics of feature occurrence, shared features will send poor synapses toward distinctive features, and dominant features will send poor synapses toward non-dominant features. The situation in the third line ([1 0]) occurs at the synapses which leave a non-dominant feature toward a dominant feature, when the non-dominant feature is not perceived (the pre-synaptic activity is often close to zero, since non-dominant features are often absent). As a consequence, a distinctive non-dominant feature continues to send strong synapses toward all dominant features, with only poor weakening. The same condition also occurs if we consider a synapse from any distinctive feature toward a dominant shared feature. The previous considerations require two comments on the concept of dominance used throughout this work. First, here we consider the frequency of occurrence as the unique aspect characterizing dominance. We are aware this is a limitation, but it was adopted to simplify our analysis. Second, the level of dominance (i.e., whether a feature is dominant or not based on its SS . The higher this frequency) is strictly related to the value used for the post-synaptic threshold post

threshold, the higher the frequency level required to have dominance. A simplified analysis is given in Appendix 1. SS ) has been assigned the same value as in The saturation value for semantic synapses ( Wmax

previous papers (Ursino et al., 2010, Ursino et al., 2011). This value was chosen so that, when 21

ntic area can excite a target neuron in another semantic area, above the central portion of the sigmoid relationship (i.e., above the central abscissa S). As shown in the Results section, this choice ensures that in most cases two dominant features are sufficient to evoke a concept completely. The learning rate, 0SS , has been decreased compared with the previous paper: with the new value, training the semantics of an object occurs more gradually, requiring about one hundred presentations of the same object to obtain effective synapses. This allows a more accurate storage of the statistics of input features (see Results).

Parameter assignment Learning the lexical net When training the lexical network, as in the semantic training described above, we used values of the learning rates small enough to ensure a gradual convergence (basically, about 100 presentations of each word-form with the associated object). The excitatory synapses from the lexical to the semantic units have been trained using a low SL SL 0 and a high post-synaptic threshold post 0.5 . The threshold for the presynaptic threshold pre

excitatory synapses from the semantic features to a word-unit has been given following a similar LS 0; reasoning, but with an opposite role for the pre-synaptic and post-synaptic thresholds ( post LS pre 0.5 ). In other words, the threshold for the word-form is always at zero, and the threshold for

the semantic unit is always 0.5, independently of whether the neuron is pre-synaptic or postsynaptic. This signifies that a word-unit must be active to ensure learning. Then the synapse is reinforced when the corresponding feature is present in the synaptic net, and weakened when the feature is absent. As a consequence, only those features frequently participating in object representation (dominant features, in particular those occurring more than 50% of times) are spontaneously connected with the word-form. Due to the absence of connections within the lexical

22

net, the activity of an unstimulated word-form is close to zero, hence we do not need a residual s ! " Training the inhibitory synapses to the word-forms (anti-Hebbian learning) requires a different strategy. We need a feature that never (or rarely) participates in the semantics of that object (say object 1) but

to inhibit the word- ! " word-!" ! " rd-! "To reach this objective, we decided to train the inhibitory synapses whenever the feature is active in the semantic net (this is LS 0 in Eq. 1##). Moreover, the threshold for the postthe presynaptic unit, hence we assume pre

LS 0.05 ) so that if the synaptic activity (that is the word-form) has been given a low value ( post

feature and the corresponding word-form are simultaneously active, the inhibitory synapse is S 0.95 in Eq. 1##). Whenever a feature is present without its dramatically reduced ( Vi ,LhkS iL,hk S 0.05 ). The final result is word-form, the inhibitory synapse is just mildly increased ( Vi ,LhkS iL,hk

that features that sometimes participate in the object (even when non-dominant) remove their inhibition. Only those features that rarely or never participate in the object, but frequently participate in the semantics of other objects, send inhibition to object 1. The maximum saturation for the sum of excitatory synapses reaching a word-form, max in Eq. 17 has been chosen so that when all dominant features are present the activity of the word-form is close to the upper saturation, but even the absence of one feature causes its almost complete inhibition. This is possible thanks to the sharp sigmoidal characteristic used for the lexical units (see also Fig. 2 with sum = 1). The previous assumption, however, can be relaxed (thus allowing an object to be recognized even when a few dominant features have not been restored) by simply increasing the value of max (see Fig. 2 with sum = 2). The maximum saturation for inhibitory

23

synapses was chosen so that even a single bubble in the semantic net, which does not participate in object semantics, is able to move the activity of the word-form from upper saturation to inhibition.

3 METHOD 3.1 O bject description All simulations presented in this work have been performed using the taxonomy of objects described in Fig. 4 and Table 4. Although other taxonomies have also been used to check the algorithm, they are not presented here for brevity. The present taxonomy includes five distinct objects, with a different number of features ranging between 5 (object 2) and 9 (object 4). Moreover, the following aspects deserve attention: a) some features are shared by different objects, and govern the formation of categories; in particular, all objects share a common feature (named A in Fig.4 and Table 4) which determines a major category (named super-category in the following). The first three objects share another two common features (named B and C ). Consequently, the three features A, B and C establish a new category (named category 1) contained within the super-category. Finally, the objects 4 and 5 also share two common features (named D and E ); the features A, D and E set a second category (category 2) contained within the super-category. b) Features have a different input dominance, i.e., a different frequency of appearance (the frequency with which the object is used during the learning phase). This frequency, compared with the post-synaptic threshold in the learning rule, determines the emergent dominance in the model. It is worth noting that, in the model the dominance and distinctiveness of features arise only from the object representations (and associated frequencies); i.e., all features are treated in the same way, and their differences are learned solely from their distribution among objects.

24

3.2 T raining the networ k We assumed that lateral synapses within each semantic area ( L SS in Eqs. 6 and 7) and the topological structure within each area are already established, and are not modified by learning. All other synapses are trained starting from initial null values. Synapse training consists of two distinct phases. During phase I the semantic network was stimulated with some object features, while all lexical units received null input (i.e., I iL t 0 in Eq. 11). This corresponds to a phase in which the subject experiments with the objects, and learns their semantics, without an association with lexical items. The semantic features were given either the value I ijS t 0 (the feature is not perceived) or I ijS t 1.0 (a value sufficient to move the neuron close to saturation and engender a surrounding activation bubble; the feature is perceived), with a probability within the object specified as in Table 4. The overall phase consisted of 100 consecutive epochs. During each epoch, all five objects were given separately, permuted in random order (i.e., each object is presented once in each epoch) and synapses were trained through Eqs. 1316. After training, synapses in the semantic net were set at their value, which was kept fixed throughout phase II. During phase II, word-forms were presented to the lexical network together with the corresponding features in the semantic net; the synapses linking lexical and semantic aspects were trained via Eqs. 13-17. This phase also consisted of 100 epochs, but each epoch now comprised eight distinct presentations (the five word-forms associated with the five objects, the two word-forms associated with the two categories and one word-form associated with the supercategory). We used the input I iL t 1 which excites the lexical unit close to saturation. Features in the semantic network were still randomly given according to the percentages shown in Table 4. However, dominant features were now automatically evoked within each object even when their input was absent, thanks to the action of auto-associative inter-area synapses previously trained in phase I.

25

3.3 Model testing After training, we tested whether the model, with learned synapses, can explain some important phenomena. These are summarized in Table 5. For each of them, we describe the possible inputs used to test the model, the expected model output in case of success, and possible model outputs which would count as a failure.

4 R ESU L TS In the following we will first analyze the patterns of synapses obtained from the learning procedures, and discuss them on the basis of the different role of shared vs. distinctive features, and dominant vs. not dominant features. Subsequently, simulation results will be presented in terms of both object naming tasks and word recognition tasks, to test how the model behaves with respect to the phenomena listed in Table 5. 4.1 Patterns of synapses

Phase I of training - Fig. 5 displays the patterns of some inter-area synapses in the semantic net as they emerge after 100 training epochs in phase I. Each panel represents the strength of synapses entering a given feature in object 4 (see Table 4 for the position) from features in the other areas. We recall that these synapses can only be excitatory in the model. Six different features are considered, all taken from object 4: the dominant feature (A) in the super-category; a shared dominant feature ( D ) in category 2; a shared non-dominant feature ( E ) in category 2; a distinctive dominant feature ( F4) in object 4; a distinctive moderately-dominant feature ( H4) in object 4; a distinctive non-dominant feature ( M4) in object 4. The shared dominant feature A receives synapses from all features in the five objects; synapses are stronger from the other shared features and weaker from the distinctive features, and are further weighted by dominance (the smaller the dominance, the smaller the synapse weight). The shared feature D in category 2 exhibits a similar pattern of entering synapses, but now synapses arrive only 26

from features in object 4 and object 5 (indeed, these are the two objects in this category) but do not arrive from the three objects in category 1, or from the super-category. The shared feature E in category 2 is non-dominant (its frequency is just 50%), hence it receives only mild synapses from the other features (let us consider the values in the color bars beside the panels). As shown below, such values are insufficient to evoke this feature. The dominant distinctive feature F4 receives significant synapses from all distinctive features in object 4 (stronger from the dominant, less strong from the non-dominant) but does not receive synapses from the shared features (and, of course, does not receive synapses from features in other objects). The synapses entering the moderately dominant feature H4 follow a similar pattern but with smaller values: indeed, this feature is at the boundary between dominance and non-dominance. Finally, the non-dominant feature M4 receives just negligible synapses from the other distinctive features of the object, and so cannot be spontaneously evoked.

The role of post-synaptic threshold on dominance - The previous synapses were obtained using a SS in Eq. 13) as high as 0.5 (this signifies that a feature is threshold for the post-synaptic activity ( post

dominant when its frequency is more than 50%). In order to unmask the effect of this parameter on dominance, we repeated phase I of training using different values for the post-synaptic threshold. By way of example, Fig. 6 shows the strength of the synapse entering the feature H4 in object 4 from feature G4, and the strength of synapse entering feature H3 in object 3 from F3. The synaptic SS . strength is shown at the end of training with different values of parameter post

If a low value is used for this parameter, the synapse reaches a high value, close to saturation, within the 100 training epochs. This signifies that the corresponding post-synaptic feature can be easily evoked by the other features of the same object, hence it is dominant (this feature spontaneously comes to mind). If the threshold is progressively raised (still performing 100 training epochs for each value starting from null initial synapses) the synapse assumes lower values and the 27

feature loses its dominance until, at a threshold level above the frequency of feature appearance, the synapse becomes close to zero (it is worth noting that the frequency of H3 is 70%, whereas the frequency of H4 is just 60%)

Phase II of training The pattern of synapses obtained from phase II is summarized in Figs. 7 and 8. Fig. 7 shows the strength of synapses entering a few representative features in the semantic net (chosen among all objects and all categories) from the lexical net (i.e., these are synapses WijS,hL in Eq. 10). It is evident that a shared feature receives synapses from all the word-forms it belongs to (Feature A receives synapses from all eight word-forms; feature B from the word-forms representing the category1 and the objects 1, 2 and 3). A distinctive feature receives excitatory synapse just from a single word-form. Moreover, the strength of all these synapses reflects the degree of dominance: only dominant features receive strong synapses (let us see, for instance, features F1, G2, F5); features with moderate dominance receive intermediate synapses ( H3, G5); non-dominant features ( E , M4) receive poor synapses, unable to evoke them. Fig. 8 shows the strength of synapses entering units in the lexical net representing the eight word-forms (denoting the super-category: left upper panel; category1 and category2: central and right upper panels; and the five individual objects). Some aspects deserve attention. First, each word-unit is excited only by the dominant features concurrent with its recognition. Second, the sum of synapses from these dominant features is close to parameter max as a consequence of the normalization rule used in Eq. 17. Since the value of max has been chosen to represent the minimum excitation sufficient to lead a word-unit close to saturation, this signifies that all dominant features are necessary to move the word-unit from inhibition to excitation. This strong constraint, however, can be relaxed by increasing the parameter max (see Fig. 2). Finally, each word-unit is inhibited by those features not participating in its semantics, but concurring with the semantics of other word-units (for instance,

28

4.2 Model testing The values of synapses obtained after training incorporate the differences between shared and distinctive features, and the differences between dominant and non-dominant features quite reasonably, in agreement with the qualitative schema depicted in Fig. 3 above. Then we tested whether the model with trained synapses can simulate the phenomena listed in the Method section (see Table 5). To this end, we performed both object naming tasks, in which some features are given as input, and word recognition tasks, in which single word-forms are stimulated.

4.2.1 Simulations of object naming tasks The overall performance of the model, in terms of successes over total

in the different tests, can be summarized as follows: i) Category formation - if only some shared features are used as input, the network correctly recognizes the corresponding category and evokes the correct word-form, and does not evoke distinctive features of individual objects. This occurs in 10/10 trials (i.e., shared properties never evoked individual members). In category 1 just one dominant feature is sufficient; two dominant features are necessary to evoke category 2. ii) Recognition of individual members - If distinctive features are stimulated, the network recognizes the corresponding object and evokes the correct word-form, without evoking features of other members. This occurs in 29/29 trials. (i.e., we never observed interference from one member to another). iii) Management of incomplete information - We tested the capacity to recover information from all cases using two distinctive features as input. Results show that two distinctive features are sufficient to evoke all dominant properties, and activate the correct word-form in objects 1, 2, 3 and 4 (number of successes 30/30). It is worth noting that not only distinctive features, but also all shared features are evoked, i.e. a shared feature may be dominant for an individual member. In 29

object 5 the success rate is 2/3: in fact, in one case (input F5+H5) two distinctive features were unable to evoke the remaining dominant features. iv) Parsimony In no case is a non-dominant feature (according to the definition provided above, i.e. with frequency equal to or less than 50%) evoked by the other features. The only way to excite such a feature is to nourish it from the external input. The success rate is 50/50. Nevertheless, a non-dominant feature may still play a role in object recognition when given as input, since it contributes to the excitation of the dominant features and, in this way, helps object recognition. An example of simulation is shown in Fig. 9, with reference to the recognition of object 4: in this case just two features were given as input (the dominant feature F4 and the non-dominant feature

L4). All other dominant features ( G4, H4, A and D ) are recovered quickly. The feature H4 exhibiting the least dominance is also the last to pop out in the semantic net. At this point the correct word-form is excited in the lexical net. Non-dominant features ( E , I4, M4) are not evoked. v) Absurd inputs We performed a total of additional 30 simulations with all five objects, by providing a wrong feature as input, together with a correct pair of distinctive features. In no case is the word-form evoked in the lexical area; hence the object is refused with a 30/30 success rate. vi) Similarity - An important property of our model is that features are topologically organized, i.e., similar features occupy proximal positions in the semantic network and tend to be active together (see the activation bubbles in the left panels of Fig. 9). The consequence is that an object might be recognized even if some of its properties are moderately altered compared with the ing. s ability to recognize an object even in the presence of corrupted information, we provided two distinctive features as input to the semantic net, but with their location shifted compared with the standard values used during training. The results (Tab. 6) show that with the basal parameter values the model exhibits a poor capacity to recognize a corrupted object. The only cases with a moderate success rate (7/15 or 12/28) occur when a feature is shifted by one position and the other is at the exact value. The reason for such a 30

poor performance is that the lateral excitatory synapses in the semantic net have a narrow receptive field (ex = 0.8), so the activation bubble comprises just the features immediately close to the ! "

# ! " ly if we use a wider receptive field for the lateral synapses (in particular we used ex = 1.1 instead of 0.8). This change may replicate the effect of attentive mechanisms, which induce a wider activation in the semantic net in an effort to solve this more difficult recognition task. Now the model can perform object naming tasks correctly even if both features are modified. If both features are changed by a distance d < 1, the success rate is now 100% (the number of trials in each case is shown in table 6). The performance is still very good with greater corruption, provided at least one of the two features is a dominant one: the success rate is close to 90% if one feature is at a distance d = 2 and the other is at a distance d = 1 (the results of a simulation are summarized in Fig. 10), but satisfactory results are obtained even with features at distance d -dominant, the network cannot recognize an object whenever d > 1. In general, we can argue that the network is able to deal with large corruptions, especially if these affect the dominant features.

It is worth

noting that the change in parameter ex does not significantly affect the other results; the only effect is an increase in the activation bubble in the semantic area. vii) Temporal response - We tested how typicality of the input may affect the response time. There are two ways by which typicality can be expressed in our input data: an input is more typical than another if, given the same number of features, it has a greater number of dominant ones. The second possibility is that an input is more typical if we perceive a greater number of features (for instance, if we see a cat that is meowing, has the tail and is purring, it is more typical than a cat with the same properties but without a tail). Results on the temporal response are summarized in Fig. 11 in the form of a box and whiskers graph, where the y-axis represents the number of steps (related to the overall time interval) necessary to evoke a word-form. In the first set of trials (upper panel in Fig. 11) we tested the role 31

of dominant vs. marginal features in all five objects when two distinctive features are given as input. Results show that the response time depends strongly on the typicality of the inputs. A longer time is required when non-dominant features are used as input, since these must evoke the dominant features before the corresponding word-form appears. The differences (tested with Mann-Whitney test) are statistically significant (see figure legend). The bottom panel summarizes the effect of the number of input features on the temporal response in all objects. The response time diminishes with the number of features used as input and this dependence (tested with Mann-Whitney test) is strongly significant (see figure legend). Finally we tested the temporal responses in cases of corrupted objects (i.e., the same tested in table 6). In all cases the network requires a longer time to recognize a corrupted object compared with one having the same input features but without corruption (results not shown here for briefness).

4.2.2 Simulations of word recognition tasks In these simulations we excited a single word-form in the lexical network, and observed which features were evoked in the semantic net. We say that a word is correctly recognized if it can evoke all dominant features constituting its spontaneous semantics. The results, summarized in Table 7, confirm that each word-form evokes all its dominant features; moreover, word-forms representing categories never evoke the distinctive features of its individual members. Two exempla of simulations are reported in Figs. 12a and 12b, showing network response to the word-form representing category 2 (Fig. 12a) and to the word-form representing object 3 (Fig. 12b). Category 2 evokes two dominant features ( A and D ) while the non-dominant feature E is not evoked. Object 3 evokes its seven dominant features; marginal feature L3 is not evoked.

5. D ISC USSI O N

32

The present paper presents a new neural network that can automatically acquire the meaning of concrete objects and categories exploiting the statistical concurrence of features, and can link them to word-forms. This is followed by an analysis of the role of Hebbian learning in associative networks, focusing on the impact of pre- and post-synaptic thresholds, to assess the role of shared vs. distinctive and dominant vs. marginal features. Finally, a mechanism is suggested to exploit similarity among objects. Although some aspects were already present in our previous works, namely the difference between shared and distinctive features, the present study introduces an important novelty: in training the network we considered that features may occur with a different frequency, some of them being present more often than others. Conversely, in our previous versions (Ursino et al., 2009, Ursino et al., 2010), objects were always complete during the learning phase, and some features were lacking just in the recovery phase. This is a major difference, since it leads to a discrimination between dominant and marginal features. Before discussing the individual assumptions introduced in building the model, we must recognize that although these assumptions are biologically plausible, they have not been taken from neurophysiological studies, nor do they reflect in vivo direct measurements. Hence, at present the model should be considered an artificial neural network, which can solve some interesting semantic-lexical tasks, adopting ad hoc hypotheses on the network organization and learning rules. Each of these assumptions, and the consequent results, require extensive validation in future physiological studies. Two theoretical aspects of the model are particularly important. First, in agreement with several ideas in the literature (Barsalou, 2008; Martin, 2007; Pulvermüller & Fadiga, 2010) the model assumes that when we recognize an object, we recover the dominant features in our semantic net, i.e, we reconstruct all pieces of information frequently present in our past experience. Hence, we do not simply manipulate symbols, but reconstruct experience. Second, we assume that during an object naming task, object recognition occurs only when enough dominant information has been 33

recovered in the semantic net. The latter point can be tested on the basis of the response time (if we need to reconstruct the overall dominant content, this signifies that strong typicality reduces the response time for object naming, while the absence of some important features, i.e., poor typicality, increases the response time). Moreover, damage in some feature areas, which eliminates some excitation, should also eliminate the capacity to recognize objects which strongly exploit those areas. Conversely, other theories may assume that the link between the lexical and the semantic nets contributes critically to object recognition (i.e., just a few features, especially the distinctive ones, can evoke the word-form, and then the word-form evokes the remaining features). In this case, a lesser effect of typicality should be evident. In the following, the semantic net is analyzed first, followed by a focus on the connections between the semantic and the lexical aspects.

The semantic memory We used an attractor network to simulate the semantic memory, but we introduced several important changes with respect to classic auto-associative models. First, we adopted a sparse code for object representation, in which just a bubble of adjacent neurons is active in each area, and information is topologically organized within each area. This aspect reflects two major points frequently discussed in the literature (see Kiefer & Pulvermüller, 2012): semantic representation is

distributed over different brain regions, and may involve a modal representation (i.e., exploiting sensory and motor information, where a topological organization is more evident). Although the modal/amodal debate is not directly addressed in our model, it is clear that network organization strongly favors a modality-dependent representation of features, and may agree with an embodied hypothesis of our semantics (Barsalou, 2008). The most innovative aspect is that the use of a low threshold for the pre-synaptic neurons, with a high threshold for the post-synaptic ones, naturally encodes the differences between sharedness and distinctiveness and the difference between dominance and marginality. The consequence is that the 34

semantic network can manage quite complex taxonomies of objects. Moreover, it extracts a parsimonious representation in which marginal features are not spontaneously evoked. However, neither the idea of a topological organization within an attractor network or the use of asymmetric Hebb rules are entirely novel. The first was proposed by von Der Marlsburg et al. in the mid-eighties (Von der Marlsburg & Schneider, 1986) and used by Miikkulainen et al. when building a semantic-lexical model (Miikkulainen, 1993, 1997). The Hebb rule proposed by us shares several aspects with the well-known BCM rule (Bienenstock, Cooper, & Munro, 1982). An important aspect of the BCM rule, not considered in our work, is the possibility to make temporal adjustments to the post-synaptic threshold to avoid instability: a frequent choice is to use the average value of the post-synaptic activity in a previous period (Bienenstock et al., 1982, Cooper et al., 1979, Dayan and Abbott, 2001). The present model avoids instability by imposing a saturation value for synapses. The post-synaptic threshold in our model is crucially related to the concept of dominance, since it approximately sets the frequency that discriminates between dominance and marginality. We chose a fixed post-synaptic threshold as high as 0.5, hence only features with a frequency higher than 50% become dominant. The use of a time varying threshold, and the analysis of its consequences, can be the subject of future studies. The concept of dominance requires a further comment. This is a fundamental emergent property of our simulations: a local property that belongs to the individual concept. We say that a feature is emergent dominant if it comes to mind spontaneously when thinking of a concept (either in response to a partial cue or in response to a word-form). In this regard, this definition resembles that used in the linguistic literature to describe feature listing tasks (Sartori and Lombardi, 2004). However, our idea is that emergent dominance more closely resembles but does not fully coincide Ribs in their Feature Overlap Model (Smith et al., 1974) . Characteristic features generally describe a prototype, but are not necessary for it. In other terms, they are often (but not always) present in a 35

concept. Prototypes are very useful to define concepts and categories that evolve naturally, which is the subject of the present work. Furthermore, the concept of dominance is important to introduce the idea of typicality, widely used in semantic tests (see below). A further important property, that is often used (Rosch and Mervis, 1975) defined as the probability that an item belongs to a category, given that it has a certain feature. word model, cue validity is stored, after learning, not only in the synapses from the features to the wordforms, but also in the synapses within the semantic net (for instance, a marginal feature does not send strong synapses to a word-form, but can have a strong cue-validity by sending strong synapses to other dominant features in the semantic net; the latter, in turn, can evoke the word-form). Hence, cue validity is related with the strength of the overall path from a feature to the word-form, including all possible branches. The relationship between our model results and cue validity is quite complex and requires further investigation. In the present simulations the only factor affecting dominance is frequency. As stated in the Introduction, other elements, not considered here (such as the emotional impact of a feature, or its role in object functioning) may affect dominance. Indeed, features with high emotional or functional impact may become dominant, even if they occur rarely. However, they require a different learning strategy, probably a different learning rate in the Hebb rule. If a strong learning rate is used for the synapses entering into an emotionally important feature, these synapses can be moved up to saturation even after very few presentations (theoretically, even after a single presentation if a strong learning rate is used). At that point, the feature has become dominant, and is evoked spontaneously by the others, even if it no longer appears in perception. This idea assumes that our learning rate is under the influence of attentional/emotional mechanisms. Finally, it is to be stressed that our model may deal with individual differences in semantics. In all the present examples, we assumed just a single set of percentages for features (shown in Table 4) and a single post-synaptic threshold. If training is repeated assuming different percentages (that may 36

reflect a different individual experience) or using a different post-synaptic threshold (which may reflect a different, better or poorer, memorization capacity, see Fig. 6) a different set of dominant features would be obtained for the same object. Hence, the model may easily assess individual variability. Although already investigated in our previous works (Ursino et al., 2010, Ursino et al., 2011), the difference between sharedness and distinctiveness also merits a short comment. This distinction is extremely important in our model, since it allows the formation of categories (i.e., superordinate concepts), overcoming some of the classical limitations of attractor auto-associative networks. Classic attractor networks trained with the Hebb rule have strong difficulties in dealing with highly correlated objects: indeed, these networks are usually trained with orthogonal patterns, or randomly generated patterns. In our network correlated patterns can be managed quite well, avoiding any interference: information can propagate from distinctive to shared features through strong synapses, but cannot propagate from shared toward distinctive features, thus preventing an avalanche effect of shared features toward distinctive features of other objects. As a consequence, no interference occurs, provided the distinctive properties of different objects are not given simultaneously. It is noteworthy that a property may be non-dominant for a category as a whole, but may become dominant for a given sub- # #% &% & if just some exemplars possess a given property, and these exemplars are used many times during training (associating them with a specific name) a sub-category is formed with the specific property associated with the sub-category.

Lexical network " ! % $&

Damasio et al., 1996, Gainotti, 2006) collecting all the conceptual information on the individual objects stored in the semantic network. Although the present model is not strictly related to biological reality, there is some evidence of a possible location of this zone in the anterior temporal 37

lobe (Simmons and Martin, 2009, Visser, Jefferies, and Lambon Ralph, 2010). Nevertheless, it is important to stress that lexical units in our model do not store the conceptual content of the object per se, but are used to retrieve the stored information by exciting neurons in the semantic areas (in response to a lexical input), and to recognize that a piece of complete conceptual information has been reestablished (in response to a sensory or motor cue). An important point still debated in the literature (see McNorgan et al., 2011) is whether information is integrated mainly via direct connections (as in the present semantic net) or via cascade integration sites (i.e., convergence zones, as in the present lexical-semantic connections). The present model exploits both kinds of connections, although it privileges the direct connections in the semantic net. An improved model should probably incorporate many consecutive convergence zones (that is, a convergence zone is not necessarily a lexical one), which may represent progressively more general features, and can help the organisation of the semantic memory. This may be analyzed in future works. In training the synapses between the semantic and the lexical nets, we made use of certain assumptions: these are currently the basis of final network behavior, and need further validation (or rejection) in future neurophysiological studies. First, we did not use a BCM type rule to train excitatory synapses between the lexical and the semantic nets. Rather ON allow a change in these synapses. Potentiation or inhibition then depends on whether the activity in the semantic net is above threshold (independently of whether it is presynaptic or post-synaptic). the lexical network requires a supervisor to point to the correct word during training (either word reading or word listening). An implicit result in our model is that during object naming tasks an object is recognized when its dominant features are excited: just at this point, a word-form is excited in the lexical net. This 38

reflects the idea that object recognition consists in the reconstruction of our past experience of an object, but according to a parsimony principle (marginal features are not evoked) and the fact that an object may be recognized even in the absence of a corresponding word (as may occur in animals), i.e., auto-association occurs entirely in the semantic net. The relationship between the semantic and the lexical nets is hetero-associative: the occurrence of one event in one network induces the corresponding event in the other. To obtain the previous result, we assumed that the sum of excitatory synapses entering each lexical unit cannot overcome a maximum level. From a neurophysiological point of view this assumption is acceptable, since it may reflect competition among synapses for a limited neurotransmitter, or for limited metabolic resources. From a functional point of view, this rule has been introduced to manage objects with a different number of dominant features. Let us consider, for instance, the word denoting category 1, with three dominant features (A, B, C ) and the word denoting object 1 with six dominant features ( A, B, C, F1, G1, H1). A question is: how can the first word be active when only features A, B, C are ON, while the second word requires six features to be ON

simultaneously? A simple response may be that, in the first case, incoming synapses must be

twice stronger than in the second case, so that just three excitatory inputs are sufficient to move neuron activity from the

OFF

state to the

ON

state. This solution can be easily obtained during

training, assuming that the sum of entering synapses is the same for each word-form. In the present work we used a sharp sigmoidal relationship for lexical units, and assumed that the sum of all excitatory inputs is just sufficient to move the unit from the

OFF

to the

ON

state. In this way, all

dominant features in the semantic net are necessary to recognize a word. We claim that this assumption is computationally efficient (it is much better to have a memory that can restore all dominant properties, rather than just a part). Of course, this constraint may be relaxed, when necessary, by simply increasing the saturation of synapses (i.e., parameter max in Eq. 17) or using a less sharp sigmoidal relationship.

39

Of course, the assumption that all (or most) dominant features must be evoked to achieve object recognition does not signify that we fail to recognize objects lacking some important properties (for instance a cat without a tail). In fact, the auto-associative dynamics in the semantic net restores all lacking dominant features starting from a sufficient cue, thus allowing the recognition of objects even in the absence of important input properties. In the Results section, we checked that two features are sufficient in most cases, independently of the number of dominant features. Furthermore, we included inhibitory synapses from the semantic net toward word-forms. A feature frequently experienced in the past sends a significant inhibition to all word-forms that were never simultaneously active. This inhibition plays a pivotal role in our model: it avoids the conceptual representation of a member in a category (for instance object 1, with all features A B C

F1 G1 and H1) activating not only its word-form, but also all its super-ordinate concepts (in this case, category 1 with features A, B, C and the super-category with feature A ). For instance, a conceptual representation containing the feat - -form Furthermore, as shown in the Results section, inhibition allows us to refuse objects with an absurd feature (i.e., a feature that had never occurred with that object before, like a dog with wings). The latter point (absurd content rejection) has been introduced to have a smart network, that refuses the absurd; however, this does not necessarily correspond to a real human behavior. For instance, it is probable that a person can still recognize a cat, even whether, due to some trick, it appears as if it were barking. In this case, to consent for the recognition, the inhibition should be suppressed in the model. In other words, the model without inhibitory synapses can accept an absurd concept. A limitation of the present model is that we used a localist representation of the lexical aspects (whereas semantics is distributed over different areas). This choice was made to simplify the computational complexity (especially during the training phase) and can be overcome in future 40

model extensions, for instance using a distributed representation of orthographic or phonological features. A further limitation is that we completely separated the training phase of the semantic network (phase I) from the training phase of lexical-semantic connections (phase II). Although the lexical layer sends feedback synapses back to the semantic net, and so can improve the restoration of the dominant aspects of a word, most of the semantic representation is learned during phase I. A more sophisticated relationship between lexical and semantic aspects may be achieved by assuming that inter-area synapses in the semantic net are plastic also in phase II. This might help achieve a more sophisticated context- language modulates ongoing cognitive and perceptual processing in a flexible and task- , 2012).

Lastly, some of the results of the present simulations open the way to new investigations to be carried out in future studies. An interesting aspect of our model is the possibility to manage the similarity among objects within the semantic store. We showed in the Results section that the network can recognize an object (and evoke the corresponding word-form) even if some features are modified with respect to the value used during training. This possibility depends on the topological organization of features in each area and on the width of lateral synapses that control the phase, resulting in sharp prototypical objects stored in the auto-associative semantic network. Then, we used wider synapses to manage a difficult task, when the subject must recognize an object despite wider feature changes. One possibility is that lateral synapses are under the control of an attention mechanism, which modifies the activation bubble depending on the difficulty of the task. Another possibility is the use of wider synapses directly in the training phase, thereby realizing less precise prototypes. The present model provides some indications on the response time required to complete an object naming task (Fig. 11). A significant result is that the response time depends on the typicality 41

of the object used as input, compared with the prototype of its category; i.e., the reaction times of the response decrease as the typicality of the test instance increases. This effect was first suggested by the Feature Overlap Model (Smith et al., 1974) and then verified in subsequent studies with different paradigms (McCloskey & Glucksburg, 1978; Larochelle & Pineu, 1994; Rips et al., 1973; Kiran and Thompson, 2003). Our model agrees with these results, providing a possible explanation. If a typical input is given to the model (i.e., one containing all or almost all dominant features), the overall information necessary to activate the word-form is immediately available. Conversely, if an input contains just a few dominant features, the dynamics of the auto-associative semantic net must first evoke the remaining dominant features: hence, the information required to activate the word-form becomes available only later. An alternative to this model may be a model in which a single distinctive dominant feature can send a strong synapse to the word-form, able to activate it immediately even before the other dominant features are evoked in the semantic net. In the latter model, the typicality effect would be much less evident. According to this major concept, the model furnishes various predictions on the time responses, which may be tested in future works. In particular, the response time in the model depends on the following aspects: a) The number of dominant features: the smaller the number of dominant features given as input, the longer the response time (see fig. 11 upper panel as an example). b) The presence of distinctive features: a distinctive dominant feature is more important than a shared dominant feature; without distinctive features, an object cannot be recognized. c) The presence of marginal features: marginal features can help object recognition, but they require more time if they are not associated with dominant features; they must recall dominant features first, before object recognition occurs (see Fig. 11 upper panel);

42

d) The number of features: the response time decreases with the cardinality of the input (Fig. 11 bottom panel). In particular, if marginal features are added to dominant features, the response time decreases further; marginal features provide an additional input, which hastens the response. are still able to evoke an object, but require more time; they must recall the prototypical feature via lateral intra-area synapses, before recognition occurs. The predictions in points a-e may become the target of ad hoc experiments, in which the number, dominance and similarity of input features are manipulated, and the true response time is collected. Furthermore, the model might be used to simulate additional aspects: i) the effect of priming (by providing a preliminary excitation to some features or a similar word to the lexical net, just before starting the object recognition task); ii) the results of feature listing tasks; iii) the effect of damage in specific areas of the semantic net, to mimic the condition of neurological patients with local brain lesions. To this end, we are developing a version of the model implementing real objects described with realistic attributes.

In conclusion, we developed a neural network model of the semantic-lexical system, based on a distributed representation of features, consistent with a modal representation of our concepts, and exploiting auto-association and hetero-association with Hebb rules to learn from input statistics. Mechanisms are hypothesized to explain how complex semantics, including multiple categories, similarity and a differentiation between dominant and marginal features, can be extracted from the statistics of the inputs, and to show how word-forms can be associated with this conceptual representation avoiding errors among superordinate and subordinate concepts (category and members, respectively). The network may represent a tool to evaluate the results of experimental tests, assess psychological theories, and devise artificial systems for object representation.

43

A PP E N D I X 1 SS 0 in Eq. 13. The condition in which For the sake of simplicity, let us assume pre SS pre 0.05 causes just mild differences compared with the present analysis, causing a further modest

reduction in dominance. Moreover, let us assume that, during training, activity of the pre-synaptic S S semantic unit may assume just the two values: xhk 0 (silent) and xhk 1 (maximally active). S 1 . In this condition, Eq. 13 Since the presynaptic threshold is zero, training occurs just when xhk

applied to the semantic synapses becomes: WijSS ,hk

SS if xhkS 1 ijSS,hk xijS post S 0 if xhk 0

(A1)

Hence, the average value of the effective synaptic change (computed by averaging just the upper term in Eq. A1) is SS S SS E WijSS , hk ij , hk E xij post

(A2)

where E {} denotes the expected value, and so E xijS represents the frequency at which the feature S ij is present during training of the given object. Here we assume that x hk and xijS are generated

S . independently, hence E { xijS } is the same independently of the value of x hk

The previous equation signifies that the synapse tends to progressively increase toward its

SS SS , or to decrease toward zero if E xijS post . The time maximum saturation level if E xijS post

required to reach these upper or lower levels depends on the learning rate, 0SS , and on the difference in the right hand member of Eq. A2, but the final level is reached after a sufficient step SS sets the frequency which discriminates between dominant and nonnumber. As a consequence, post SS , since dominant features. In reality, however, this frequency may be somewhat higher than post

neural units are not exactly at the values 0 and 1 and moreover we adopted a value for the pre44

synaptic threshold a little above 0 to avoid any residual activity in a presynaptic unit contributing to a moderate synapse reinforcement when not required. SS 0.5 signifying a dominance level slightly higher than 50%. In the present simulations we set post

45

L E G E N DS T O F I G U R ES F igure 1 Schematic diagram describing the general structure of the network. The model presents Areas (upper shadow squares; each square represents a different feature area composed of 20x20 neural units). Units are connected with other units in the same area via lateral excitatory and inhibitory intra-area synapses (si m ilarity), and with units in different areas via excitatory inter-area synapses ( auto-association). The Lexical Area consists of 8 elements (lower shadow square). Units in the semantic network receive excitatory synapses from lexical units; lexical units receive both excitation and inhibition from semantic units. F igure 2 An example of how a unit in the lexical layer can be activated by the concurrence of several dominant features in the semantic net. The figure represents the input-output sigmoidal relationship of a word-form, where usat = 1 is the input level necessary to excite neuron up to saturation. We assume that the word-form is associated with M dominant features, and the sum of synapses entering the unit is assigned. Let us neglect a possible difference in dominance (that is, let us assume that, after sufficient training, all synapses have a similar value) and that an active feature in the semantic net has output close to 1. If the sum of entering synapses is equal to usat = 1 (as in the present simulations) all M dominant features are necessary to move the working point from 0 to the upper saturation limit. Conversely, if one uses a sum as high as 2·usat, just M/2 dominant features are sufficient to activate the word-form. F igure 3 Qualitative exemplum of synaptic connections among shared and distinctive features, and among dominant and marginal features, showing the required asymmetry of the autoassociative network. F igure 4 Object taxonomy used in the present simulations. We have 5 distinct objects, each with a different number of features. Each feature has a different frequency of occurrence, indicated in Table 4. Shared features represent categories. In this particular example, feature A is shared by all objects, and defines a super-category. Features B and C are shared by objects 1, 2 and 3, and define a first category. Features D and E are shared by objects 4 and 5 and define a further category. F igure 5 - An example of the synapses connecting features in the semantic network after training. The network has been trained using the 5 objects described in Figure 4 and Table 4. In order to facilitate the reader, the big right panel displays the position of the features within the overall semantic matrix of 60x60 units, organized into nine feature areas, each of 20x20 units. On the left, each of the six panels depicts, by color, the strength of the synapses entering the specified postsynaptic feature in object 4, coming from all different pre-synaptic units in the semantic net. Shared 46

features (upper panels) receive synapses from all objects in their category; distinctive features (lower panels) receive synapses only from features of object 4. The strength reflects the dominance. SS F igure 6 Effect of the post-synaptic threshold ( post in Eq. 13) on feature dominance in the

semantic net. The figure shows the strength of the synapses from feature F3 to H3 in object 3, and from feature G4 to H4 in object 4, after 100 training epochs, using different values of the postsynaptic threshold. The synapse progressively falls to zero (and so the feature becomes nondominant) when the frequency of the post-synaptic feature (70% as to H3, 60% as to H4) becomes smaller than the threshold. F igure 7 - An example of the synapses from word-forms in the lexical net to some features in the semantic net after training. The synapses have been trained using the 8 word-forms as input to the lexical net and the corresponding features as input to the semantic net (see Table 4). Each of the nine panels represents the strength of the synapses reaching the specified post-synaptic semantic unit, coming from all different pre-synaptic word-forms in the lexical net. Shared features receive synapses from different word-forms (e.g., feature A receives synapses from all 8 word-forms); distinctive features receive synapses only from one word-form. The strength reflects the dominance of the post-synaptic feature. F igure 8 - Synapses from features in the semantic net to word-forms in the lexical net after training. The synapses have been trained using eight word-forms as input to the lexical net and the corresponding features as input to the semantic net (see Table 4). For clarity, the big bottom panel displays the position of the features within the overall semantic matrix. Each of the eight top panels depicts, by color, the strength of the synapses reaching the specified post-synaptic lexical unit, coming from all different presynaptic units in the semantic net (all eight word-forms are represented). Categories (upper panels) receive excitatory synapses only from shared dominant features; words representing individual objects receive excitatory synapses both from shared and distinctive dominant features of their corresponding object. The sum of excitatory synapses is constant thanks to normalization. Finally, all word-forms receive inhibition from features previously used, but belonging to other objects. F igure 9 - Simulation of an object recognition task, in which the subject must recognize object 4, when 2 features ( F4 dominant at position 35,5; L4 not-dominant at position 50,35) are given as input. The 4 lines represent activity in the semantic network (left panels) and in the lexical network (right panels) at different steps of the simulation (duration of each step: 0.4 ms). As evident in the left panels, the two features progressively evoke the other 4 dominant features in the semantic 47

network; when the overall conceptual representation is reconstructed, the word-form is activated in the lexical area (position 7). F igure 10 Simulation of an object recognition task, in which the subject must recognize object 1,

given as input ( F1 dominant at position 27,5 i.e. shifted by 2 positions; G1 dominant at position 26,25, shifted by 1 position). The simulation was performed by increasing the extension of lateral excitatory synapses (ex = 1.1 instead of 0.8 in Eq. 6) to mimic greater attention. The 6 lines represent activity in the semantic network (left panels) and in the lexical network (right panels) at different steps of the simulation (duration of each step: 0.4 ms). As evident in the left panels, the 2 features progressively evoke the other 4 dominant features in the semantic network despite their shift, by creating a wider activation bubble; when the overall conceptual representation is reconstructed, the word-form is activated in the lexical area (position 4). F igure 11 - Effect of typicality on response time during object naming tasks. The y-axis of each figure represents the number of simulation steps necessary to evoke the correct word-form in response to a given input (hence, an indirect measure of the response time). Results are summarized as a box and whiskers graph, where the box represents the first and third quartiles, the central line is the median, and the vertical bar represents the variability. The upper figure is a summary of the response times obtained in the 5 objects, when just 2 distinctive features were given as input. Three groups are considered: Dominant-Dominant (11 cases), Dominant Marginal (17 cases), Marginal Marginal (3 cases). The differences are statistically significant (Mann Whitney test), although in the last group, the test is Jeopardized by the small number of cases. The bottom figure was obtained by giving a different number of features (2, 3, 4 or 5, with at least 2 distinctive features) to all objects, but without using the common feature A, to evaluate the effect of feature number (2 features: 31 cases; 3 features: 62 cases; 4 features: 40 cases; 5 features: 27 cases). The differences are statistically very significant (Mann Whitney test). F igure 12 - Simulation of 2 word recognition tasks, in which the subject must recover the semantic meaning of a word-form given as input to the lexical area. Figure 12a shows the network behavior in response to the word-form representing category 2 (position 3 in the lexical net). The 2 semantic attributes of this category readily emerge. Figure 12b shows network behavior in response to the word-form representing object 3 (position 6 in the lexical net). The 5 most dominant features are rapidly evoked; the less dominant features ( H3 with 70% frequency, position 25,55 and I3, with 60% frequency, position 50,5) are evoked after a certain delay. Finally the marginal feature L3 (frequency 40%) is not evoked. 48

T able 1 Dimensions used to characterize semantic features and distinction between input (independent) characteristics, i.e. determined by the environment and context, and output (dependent) characteristics acquired by the model after learning. Some behavioural outputs are also listed. Inputs Distinctiveness vs sharedness Correlation

O utputs

Behavior

Category vs. members Hierarchical organization

Inheritance Generalization

Frequency of occurrence Emotional or functional impact

Dominance

Robustness

Accessibility

Response times

Defining vs. characteristic

Priming

49

T able 2 Parameter numerical values used throughout the present simulations

Semantic network M eaning

Symbol

V alue

Time constant

3ms

Sigmoid slope

p

40

Sigmoid position

0.55

Strength of lateral excitation

L0EX

11

Standard deviation of lateral excitation

ex

0.8

Strength of lateral inhibition

L0IN

3

Standard deviation of lateral inhibition

in

3.5

Post-synaptic threshold SS

post

0.5

Pre-synaptic threshold SS

pre

0.05

Learning rate SS

0

0.0025

Maximum synaptic strength SS

Wmax

0.1

M eaning

Symbol

V alue

Time constant

1ms

Sigmoid slope

p

50

Sigmoid position

0.8

Post-synaptic threshold SL

post

0.5

Pre-synaptic threshold SL

pre

0.0

Learning rate SL

0

0.05

Maximum synaptic strength SL

Wmax

2

Post-synaptic threshold LS

post

0.0

Pre-synaptic threshold LS

pre

0.5

Learning rate LS

0

0.006

Maximum sum of synapses LS

max

1.0

Post-synaptic threshold LS

post

0.05

Pre-synaptic threshold LS

pre

0.0

Learning rate LS

0

0.03

Maximum synaptic strength LS

V max

0.06

Lexical network

50

T able 3 Effect of post-synaptic (xpost) and pre-synaptic (xpre) activities on synapse changes in the semantic net

WijSS , hk

[ x post x pre] [0 0]

0

[0 1]

SS . ij ,hk

[1 0]

SS . ij , hk

[1 1]

E ffect

0.5 .

SS . ij , hk

0.5 .

SS . ij , hk

0.475

No change Strong weakening

0.025

Mild weakening

ijSS,hk . 0.5 . 0.95 = ijSS,hk . 0.475

51

Strong reinforcement

T able 4 Position of the different features in the semantic network and percentage of their occurrence. The first index in the third column denotes the feature area, the other two indexes the position within the area. However, a 60x60 array has been used in the program (with just two subscripts) to reduce the computational complexity. O bject Super Cat Category 1 Category 2 Object 1 Object 2

Object 3

Object 4

Object 5

F eature

Position

Percentage

A B C D E F1 G1 H1 F2 G2 F3 G3 H3 I3 L3 F4 G4 H4 I4 L4 M4 F5 G5 H5

1, 10, 10 2, 10, 5 3, 10, 5 2, 10, 15 3, 10, 15 4, 5, 5 5, 5, 5 6, 5, 5 4, 5, 10 5, 5, 10 4, 5, 15 5, 5, 15 6, 5, 15 7, 10, 5 8, 10, 5 4, 15, 5 5, 15, 5 6, 15, 5 7, 10, 15 8, 10, 15 9, 10, 10 4, 15, 15 5, 15, 15 6, 15, 15

90% 90% 70% 90% 50% 90% 80% 60% 90% 80% 100% 80% 70% 60% 40% 90% 90% 60% 40% 50% 50% 80% 70% 50%

Position of the different word-forms in the lexical network Word-form

Position

Super-category

1

Category 1

2

Category 2

3

Object 1

4

Object 2

5

Object 3

6

Object 4

7

Object 5

8

52

T able 5 Different phenomena tested with the model after training Phenomenon

O bjective

Inputs 1) object naming task 2) word recognition task

Cor rect behavior

W rong behavior

Category recognition

To manage categories without evoking individual members

Activation of shared features of the category and of the category word-form

Member recognition

To manage the semantic content of a member, without evoking the category and without evoking other members of the same category A few features can evoke the main object content To neglect the less important properties of objects

1) Shared features without distinctive features 2) Category wordform 1) Distinctive features + shared features 2) Member wordform

Activation of some distinctive features of individual members; Activation of some member word forms. Activation of distinctive features of other members; Activation of the category word-form and of word-forms of other members Failure to evoke some dominant features Marginal features are evoked

Reconstruction from incomplete input Parsimony

Absurd content rejection

To refuse an absurd sematic content

Similarity

To recognize objects similar to the learned ones A typical object given as input should be recognized more promptly than an atypical one.

Typicality

Activation of distinctive and shared features of the member only, and of the member wordform

1) Two features as input

All dominant features are evoked

1) Absence of some marginal features; 2) Word-form

The marginal features are not spontaneously evoked. The word-form is not evoked

1) Distinctive features + shared features + at least one impossible feature 1) Features with some moderate change 1) Features with a different dominance or a different number

53

Activation of the word-form despite the absurd content

The word-form is evoked

Failure to evoke the word-form

The most typical the input, the shorter the response time

The response time is independent of typicality or contradicts typicality

T able 6 Effect of similarity on object recognition. The table summarizes the of successes/ number

in object naming tasks when two distinctive features (either dominant = D or not-dominant = ND ) are given as input to the semantic network. Each property may be shifted by a distance d with respect to the value used during training. In the upper table a normal receptive field has been used for lateral synapses; in the bottom table the receptive field has been increased, to simulate greater attention. ex = 0.8

D (d = 0) D (d = 1) D (d = 2) ND (d = 0) ND (d = 1) ND (d = 2)

D (d = 0)

D (d = 1)

D (d = 2)

D (d = 5 )

ND (d = 0)

ND (d = 1)

ND (d = 2)

ND (d = 5 )

14/14

12/28

0/10

0/10

14/15

7/15

0/15

0/15

12/28

0/10

0/10

0/10

1/15

0/15

0/15

0/15

0/10

0/10

0/10

0/10

0/15

0/15

0/15

0/15

14/15

1/15

0/15

0/15

3/3

0/6

0/6

0/6

7/15

0/15

0/15

0/15

0/6

0/6

0/6

0/6

0/15

0/15

0/15

0/15

0/6

0/6

0/6

0/6

ex = 1.1

D (d = 0) D (d = 1) D (d = 2) D (d = 5 ) ND (d = 0) ND (d = 1) ND (d = 2) ND (d = 5 )

D (d = 0)

D (d = 1)

D (d = 2)

D (d = 5 )

ND (d = 0)

ND (d = 1)

ND (d = 2)

ND (d = 5 )

14/14

26/26

11/13

10/16

13/13

13/13

9/10

8/9

26/26

7/7

12/14

9/15

13/13

9/9

9/10

8/9

11/13

12/14

8/16

4/19

9/9

9/10

6/13

2/10

10/16

9/15

4/19

0/6

2/11

1/10

0/9

0/9

13/13

13/13

9/9

2/11

3/3

6/6

0/6

0/6

13/13

9/9

9/10

1/10

6/6

6/6

0/6

0/6

9/10

9/10

6/13

0/9

0/6

0/6

0/6

0/6

8/9

8/9

2/10

0/9

0/6

0/6

0/6

0/6

54

T able 7 Results of different word-recognition tasks, in which a word-form is given as input to the lexical network, and the features emerging from network dynamics in the semantic network are evaluated. It is worth noting that only dominant features (percentage at least 60%) are evoked. Word-form

F eatures

Super-Category Category 1 Category 2 Object 1 Object 2 Object 3 Object 4 Object 5

A A, B, C A, D A, B, C, F1, G1, H1 A, B, C, F2, G2 A, B, C, F3, G3, H3, I3 A, D, F4, G4, H4 A, D, F5, G5

55

References Amit, D. (1989) Modelling Brain Function: the Word of Attractor Neural Networks. Cambridge University Press; Cambridge. Anderson, J.R. (1983). A spreading activation theory of memory. J Verb Learn Verb Behav, 22, 261-295. Barsalou, L.W. (1982). Context-independent and context-dependent information in concepts. Me m

Cognition, 10, 82-93. Barsalou, L.W. (2008). Grounded Cognition. Annu Rev Psychol, 59, 617-645. Ben-Yishai, R., Bar, O., & Sompolinsky, H. (1995). Theory of orientation tuning in visual cortex.

Proc. Natl. Acad. Sci. U.S.A . 92, 38443848. Bienenstock, E. L., Cooper L., &

Munro P. (1982). Theory for the development of neuron

selectivity: orientation specificity and binocular interaction in visual cortex. The Journal of

Neuroscience, 2 (1), 3248. Cappa, S.F. (2008). Imaging studies of semantic memory. Curr Opin Neurol , 21(6), 669-75. Cooper, L.N., Lieberman, F., & Oja, E. (1979). A theory for the acquisition and loss of neuron specificity in visual cortex . Biol. Cybern., 33: 9-28. Cree, G. S., McNorgan, C., & McRae, K. (2006). Distinctive features hold a privileged status in the computation of word meaning: implications for theories of semantic memory. J. Exp. Psychol.

Learn. Mem. Cogn., 32, 643658. Cuppini, C., Magosso, E., & Ursino, M. (2009). A neural network model of semantic memory linking feature-based object representation and words. Biosystems, 96, 195-205. Damasio, A.R. (1989). Time-locked multiregional retroactivation: a systems level proposal for the neural substrates of recall and recognition. Cognition, 33, 25-62.

56

Damasio, A. (2010). Self Comes to Mind: Constructing the Conscious Brain. London: William Heinemann. Damasio, H., Grabowski, T.J., Tranel, D., Hichwa, R.D., & Damasio, A.R. (1996). A neural basis for lexical retrieval. Nature, 380, 499-505. Dayan P & Abbott L.F. (2001). Theoretical Neuroscience. Cambridge, MA: The MIT Press. Gainotti, G. (2006). Anatomical functional and cognitive determinants of semantic memory disorders. Neuroscience and Behavioral Reviews, 30, 577-594. Gibbs, R.W. (2003). Embodied experience and linguistic meaning. Brain Lang, 84, 1-15. Hart, J., Anand, R., Zoccoli, S., Maguire, M., Gamino, J., Tillman, G., King, R., & Kraut, M.A. (2007). Neural substrates of semantic memory. J Int Neuropsychol Soc, 13, 865880. Hinton, G. E. (1981). Implementing semantic networks in parallel hardware. In G. E. Hinton and J. A. Anderson, (Eds.), Parallel Models of Associative Memory, 161-187. Hillsdale, NJ: Erlbaum. Hinton, G. E. (1986). Learning distributed representations of concepts. In Proceedings of the Eighth

Annual Conference of the Cognitive Science Society, pp. 1-12. Hillsdale, NJ: Erlbaum. Hinton, G.E. & Shallice, T. (1991). Lesioning an attractor network: Investigations of acquired dyslexia. Psychological Review, 98, 74-95. Hoenig, K., Sim, E.J., Bochev, V., Herrnberger, B., & Kiefer, M. (2008). Conceptual Flexibility in the Human Brain: Dynamic Recruitment of Semantic Maps from Visual, Motor, and Motionrelated Areas. Journal of Cognitive Neuroscience, 20, 17991814. Hopfield, JJ. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Nat. Acad. Sci. (USA), 81, 3088-3092. Huette S., & Anderson S. (2012). Negation without symbols: the importance of recurrence and context in linguistic negation. J. Integr. Neurosci ., 11, 295-312. 57

Kawamoto, A. H. (1993). Nonlinear dynamics in the resolution of lexical ambiguity: A parallel distributed processing account. Journal of Memory and Language, 32(4), 474-516. Kiefer, M. (2005). Repetition priming modulates category-related effects on event-related potentials: Further evidence for multiple cortical semantic systems. Journal of Cognitive

Neuroscience, 17(2), 199-211. Kiefer, M., & Pulvermüller, F. (2012). Conceptual representations in mind and brain: Theoretical developments, current evidence and future directions. Cortex, 48, 805-825. Kiran, S.,

& Thompson, C. K. (2003). Effect of typicality on online category verification of

animate category exemplars in aphasia. Brain Lang. 85: 441450. Larochelle, S,, Pineu, H. (1994). Determinants of response time in the semantic verification task.

Journal of Memory and Language 33:796823. Lupyan, G. (2012). Linguistically modulated perception and cognition: the label-feedback hypothesis. F ront Psychol. 2012 Mar 8; 3:54. doi: 10.3389/fpsyg.2012.00054. Martin, A. (2007). The Representation of Object Concepts in the Brain. Annual Review of

Psychology, 58, 25-45. McCloskey, M.E., Glucksburg, S. (1978) Natural categories. Well defined or fuzzy sets? Memory

and Cognition 6: 462472. Mc Norgan, C., Reid, J., & McRae, K. (2011). Integrating conceptual knowledge within and across representational modalities. Cognition 118: 211-233. McRae, K. (2004). Semantic memory: some insights from feature-based connectionist attractor networks. In B.H. Ross (Eds.), The psychology of learning and motivation: advances in

research and theory. San Diego, CA: Academic Press. McRae, K., Cree, G. S., Seidenberg, M. S., & McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavioral Research Methods,

Instruments, and Computers, 37, 547-559. 58

McRae, K., de Sa, V. R., & Seidenberg, M. S. (1997). On the nature and scope of featural representations of word meaning. J. Exp. Psychol. Gen.,126, 99!130. Miikkulainen, R. (1993). Subsymbolic Natural Language Processing: An Integrated Model of

Scripts, Lexicon, and Memory. Cambridge, MA: MIT Press. Miikkulainen, R. (1997). Dyslexic and category-specific aphasic impairments in a self-organizing feature map model of the Lexicon. Brain Lang., 59, 334!366. Moll, M., & Miikkulainen, R. (1997). Convergence-zone episodic memory: Analysis and simulations. Neural Networks, 10, 1017-1036. Montefinese, M., Ambrosini, E., Fairfield, B., Mammarella, N., (2013) Semantic significance: a new measure of feature salience. Mem. Cognit. DOI 10.3758/s13421-013-0365-y. " Network: Dynamics of Learning and Computations. Cognitive Science, 33, 665!708. Plaut, D.C. (1995). Double dissociation without modularity: Evidence from connectionist neuropsychology. Journal of Clinical and Experi mental Neuropsychology, 17, 291-321. Plaut, D.C., and Shallice, T. (1993). Deep dyslexia: A case study of connectionist neuropsychology.

Cognitive Neuropsychology, 10, 377-500. Pulvermüller, F., & Fadiga, L. (2010). Active perception: Sensorimotor circuits as a cortical basis for language. Nature Reviews Neuroscience, 11(5), 351-360. Randall, B., Moss, H.E., Rodd, J.M., Greer, M., & Tyler, L.K. (2004). Distinctiveness and Correlation in Conceptual Structure: Behavioral and Computational Studies. J Exp Psychol

Learn Mem Cogn, 30, 393-406. Rips, L.J., Shoben, E.J., Smith, E.E. (1973). Semantic distance and the verification of semantic distance. Journal of Verbal Learning and Verbal Behavior ;12:1!20.

59

Rogers, T. T., Lambon Ralph, M. A., Garrard, P., Bozeat, S., McClelland, J. L., Hodges, J. R., & Patterson,

K.

(2004).

The

structure

and

deterioration

of

semantic

memory:

A

neuropsychological and computational investigation. Psychological Review, 111, 205-235. Rogers, T. T. & McClelland, J. L. (2004). Se m antic Cognition: A Parallel Distributed Processing

Approach. Cambridge, MA: MIT Press. Rogers, T. T. & McClelland, J. L. (2008). Precis of Sem antic Cognition, a Parallel Distributed

Processing Approach. Behavioral and Brain Sciences, 31, 689-749. Rosch, E. & Mervis, C.B. (1975)

Family resemblances: Studies in the internal structure of

categories. Cognitive Psychology, 7, 573-605. Rosch, E., Mervis, C., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382439. Rumelhart, D. E. (1990). Brain style computation: Learning and generalization. In S. F. Zornetzer, J. L. Davis, and C. Lau [Eds.], An Introduction to Neural and Electronic Networks, pp. 405420. San Diego, CA: Academic Press. Rumelhart, D. E., & Todd, P. M. (1993). Learning and connectionist representations. In D. E. Meyer and S. Kornblum (Eds.), Attention and Perfor m ance XIV: Synergies in Experi mental

Psychology, Artificial Intelligence, and Cognitive Neuroscience , pp. 3-30. Cambridge, MA: MIT Press. Sartori, G., & Lombardi, L. (2004). Semantic relevance and semantic disorders. Journal of

Cognitive Neuroscience, 16, 439452. Seidenberg, M.S., & McClelland, J.L. (1989). A distributed, developmental model of word recognition and naming. Psychol Rev, 96, 523-568. Siekmeier, P. J., & Hoffman, R. E. (2002). Enhanced semantic priming in schizophrenia: a computer model based on excessive pruning of local connections in association cortex. Br. J.

Psychiatry, 180, 345350.

60

Silberman, Y., Bentin, S., & Miikkulainen, R. (2007). Semantic Boost on Episodic Associations: An Empirically-Based Computational Model. Cognitive Science, 31, 645671. Simmons, W.K., & Martin, A. (2009). The anterior temporal lobes and the functional architecture of semantic memory. Journal of the International Neuropsychological Society, 15(5), 645649. Smith, E. E., Shoben, E. J., & Rips, L. J. (1974). Structure and process in semantic memory: A featural model for semantic decisions. Psychological Review, 81, 214241. Taylor, K.I., Devereux, B.J., Acres, K., Randall, B., & Tyler, L.K. (2012). Contrasitng effects of feature-based statistics on the categorisation and basic-level identification of visual objects.

Cognition, 122, 363-374. Treves, A. (1993). Mean-field analysis of neuronal spike dynamics. Network: Computation in

Neural Systems 4: 259-284. Tyler, L.K. & Moss, H.E. (2001). Towards a distributed account of conceptual knowledge. Trends

Cogn Sci , 5, 244-252. Ursino, M., Cuppini, C., & Magosso, E. (2010). A computational model of the lexical-semantic system, based on a grounded cognition approach. F rontiers in Psychology, 1, 1-221. Ursino, M., Cuppini, C., & Magosso, E. (2011). An integrated neural model of semantic memory, lexical retrieval and category formation, based on a distributed feature representation.

Cognitive Neurodynamics, 1-25. Ursino, M., Magosso, E., & Cuppini, C. (2009). Recognition of Abstract Objects Via Neural Oscillators: Interaction Among Topological Organization, Associative Memory and Gamma Band Synchronization. Neural Networks, IE E E Transactions on, 20, 316-335. Vigliocco, G., Vinson, D.P., Lewis, W., & Garrett, M.F. (2004). Representing the meanings of object and action words: The featural and unitary semantic space hypothesis. Cognitive

Psychology, 48, 422-488.

61

Visser, M., Jefferies, E., & Lambon Ralph, M.A. (2010). Semantic processing in the anterior temporal lobes: A meta-analysis of the functional neuroimaging literature. Journal of

Cognitive Neuroscience, 22(6), 1083-1094. Von Der Malsburg, C. & Schneider, W. (1986) A neural cocktail-party processor. Biol. Cybern., 54, 2940.

62

Figure 1

Figure 2

Figure 3

Figure 4

Supe r-categor y (A)

C ategor y 1 (A B C)

O bject 1 (A B C F1G1 H1)

O bject 3 (A B C F3 G3 H3 I3 L3)

O bject 2 (A B C F2 G2),

C ategor y 2 (A D E)

O bject 4 (A D E F4 G4 H4 L4 I4 M4)

O bject 5 (A D E F5 G5 H5)

Figure 5

Figure 6

Synaptic Weight

0.08

H3-F3 H4-G4

0.06 0.04 0.02 0.00 0.3

0.4

0.5

0.6

0.7

Threshold

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12a

Figure 12b

LEGENDS TO THE SUPPLEMENTARY MATERIALS The first supplementary material “Instructions” contains the instructions for the use of the different programs, incorporated, as matlab files, in the other supplementary materials. The second supplementary material contains the matlab code “Macro object training” used to generate various epochs to train the semanticnet (phase I). The third supplementary material, named “Object-training” , contains the matlab code to perform a single simulation and implements the Hebb rule for thesemantic synapses. The fourth supplementary material, named “Macro lexical training”,contains the matlab code used to generate various epochs, to train the semantic-lexicalconnections (phase II). The fifth supplementary material, named “Lexicaltraining”, contains the matlab code to perform a single simulation and implements the Hebb rule for the semantic-lexicalsynapses. Finally, the sixth supplementary material, named “Object recognition-naming tasks”,contains the matlab code used to test the model behaviour aftertraining, by performing either an object naming task or a word recognition task.

A neural network for learning the meaning of objects and words from a featural representation

A neural network for learning the meaning of objects and words from a featural representation

Recommend Documents