Neurocomputing 26}27 (1999) 963}970
Neuroscholar 1.00, a neuroinformatics databasing website Gully A.P.C. Burns* Hedco Neuroscience Building, Room 428, University of Southern California, 3614 Watt Way, Los Angeles, CA 90089-2520, USA
Abstract Describing neuroanatomical circuitry at the systems level requires the use of databases to collate published descriptions of neuroanatomical data (Burns, D.Phil. Thesis, Physiology Department, Oxford University, 1997; Felleman and Van Essen, Cerebral Cortex, 1, 1991, 1}47; Scanell et al., J. Neurosci. 15, 1995, 1463}83; Young, Proc. R. Soc. London Ser B, 252, 1993, 13}8). These data are then analyzed with computational techniques. These methods do not address the problem that qualitative neuroanatomical descriptions can be interpreted in di!erent ways, and rely on the collator's skill to produce the correct interpretation. I describe a knowledge-base management system called `NeuroScholara, designed to store multiple interpretations of neuroanatomical tract-tracing data in a neuroanatomically consistent framework. I illustrate how this system may be used in conjunction with data-mining analyses. 1999 Elsevier Science B.V. All rights reserved.
1. Introduction Descriptions of neural connections are ubiquitous in Neuroscience. They form the fundamental basis for the framework of our understanding of the organization of the nervous system and yet, a global description of the neural circuitry for the whole brain has never been produced in a single computational framework. This is due to the large number of systems-level structures in the brain (at least one thousand in the rat, [16]) but is also due to the way in which neuroanatomical experiments are performed and how their results are reported. These problems require solutions from the realm of neuroinformatics: the application of computer science and information science to the
* Corresponding author. Tel.: #1-213-740-7489; fax: #1-213-741-0561. E-mail address:
[email protected] (G.A.P.C. Burns) 0925-2312/99/$ } see front matter 1999 Elsevier Science B.V. All rights reserved. PII: S 0 9 2 5 - 2 3 1 2 ( 9 9 ) 0 0 0 9 2 - 2
964
G.A.P.C. Burns / Neurocomputing 26}27 (1999) 963}970
management and processing of data, information and knowledge related to neuroscience [8]. This paper describes the conceptual design and preliminary implementation of `Neuroscholara, a knowledge-based management system (KBMS) for summarizing and interpreting Neuroanatomical data from the literature. Consider as `dataa the lowest level of known facts, `informationa as data that have been sorted, analyzed and interpreted, and `knowledgea as information that has been placed in context of other known information [2]. In a research article, the introduction and conclusions comprise the knowledge presented in the paper. In a subject based on human interpretation such as neuroanatomy, the concepts that form the subject's basic theory are found on this kind of knowledge. E!ective computational knowledge management may be extremely productive in this context [1]. Much of the literature describing neuroanatomical connectivity is incomplete, cumbersome, error-prone, and largely qualitative. Large collations of neuroanatomical connection data are published as conventional review papers in journals and textbooks. A relatively new strategy has been to collate and publish large collections of connection data and then to analyze the network of inter-area connections with mathematical methods [3,5,11,18,19]. These studies are concerned with systems of between 30 and 100 brain structures and may be considered to be an overview of the literature from the collator's viewpoint. In all cases, the published descriptions of connections were intuitively translated into a single global parcellation scheme that had been adopted by the collator. These collections are usually represented as computer databases (or #at "les) that can be made accessible to the worldwide neuroscience community on the Internet. The task of populating such a database is extremely time consuming [1], and a collaborative methodology involving many collators is probably the only practical way of distributing the workload. The NeuroScholar project is a website acting as the user interface for a database neuroanatomical connection reports from the literature. The NeuroScholar project website is accessible via the USC Brain Project homepage at http://wwwhbp.usc.edu/HBP/. The basic speci"cation of NeuroScholar's design is as follows. 1. To provide a data resource of neural connectivity data in the Rat brain that is accessible to the worldwide Neuroscience community. 2. To permit users to make individual annotations to the connectivity data, and allow them to interpret the data as freely as possible. 3. To give users the choice of keeping their interpretations private, sharing them with other trusted users, or making them publicly available within the system. 4. To allow users to construct measures of reliability for the data based on the techniques employed in the experiment, the authors of the data and other factors. 5. To cross-reference the data in order to identify contradictions and discrepancies in the literature. 6. To utilize existing statistical methods to match parcellation schemes underlying data from the literature to that of an atlas, and to evaluate connection strength.
G.A.P.C. Burns / Neurocomputing 26}27 (1999) 963}970
965
2. Methods 2.1. Ontologies and database design The design of the database has been developed to capture the concepts of neuroscience theory adequately. This is equivalent to devising a suitable `ontologya (`descriptions of the domain knowledge of some "elda, [17]) for it. From a computer science perspective, we use a multi-level object-oriented approach. `Object orienteda in this instance refers the use of named complex data types (shown here in italics) as conceptual representations of neurobiological entities. We de"ne brainVolumes as volumes of brain tissue with geometrical properties; cellPopulations as populations of cells; dendrites, somata, axons and terminals objects represent shared properties of the constituent neurons of cellPopulation objects (Fig. 1). The computational relationships between cellPopulation, somata, terminalField, dendrites, axons and brainVolume objects (in conjunction with other objects that have not been discussed here) accurately mimic the conceptual framework of experimental neuroanatomy [4,9,15], allowing the logical structure of complex forms of tracttracing experiments, such as double- and triple-labeling, to be represented. Within the NeuroScholar system, a connection can be described by stating the brainVolume properties of the somata and terminals objects of a cellPopulation. Fig. 2 shows three such conections. Part A shows an example of an injection with an ideal anterograde tracer. The tracer is taken up within the brainVolume of a cellPopulation's somata or dendrites and will label its terminals. The reverse situation in shown in B where an injection with an ideal retrograde tracer has been made within the brainVolume of a cellPopulation's terminals and will label its somata. A nonideal (i.e. real) situation is shown in part C where the injection was made with a tracer that su!ers from `"bers of passagea labeling. The tracer is taken up within the brainVolume of a cellPopulation's axons and may label its terminals, somata or dendrites. If a given tracer is known to be very clean (such as Phaseolus vulgaris leucoagglutinin, for
Fig. 1. A schematic view of the neuroanatomical ontology used in this system for one cell type.
966
G.A.P.C. Burns / Neurocomputing 26}27 (1999) 963}970
Fig. 2. Illustration of how the neuroanatomical ontology can interpret the logic of tract-tracing experiments, see text.
example, [7]), then the database structure as shown can explicitly represent this. If, subsequent to data-entry, the method was found not to be su!er from the "bers of passage problem, the database design would allow straightforward reappraisal of the data. No existing connectivity summary databases allow users to di!erentiate between cell types and brain regions, a fundamental feature of neuroanatomical organization [3,5,11,14,18,19]. 2.2. Dealing with uncertainty Knowledge is subjective since di!erent people will interpret the same information in di!erent ways. Knowledge is also politically sensitive since di!erent users may di!er in their opinions concerning the validity of data. Therefore it is essential that data be represented in as objective a manner as possible (for an innovative way of achieving this, see [14]). The approach adopted in Neuroscholar is to implement a 1 to m to n relationship between data, information and knowledge so that it is possible for the system to support multiple interpretations by di!erent users. Security measures will be implemented to allow users to select the knowledge they wish to `publisha within the system, rendering it accessible by other users' queries. Global statistics may be generated to point out contradictions between di!erent interpretations, highlighting areas of controversy, and generating statistics for the perceived reliability of speci"c data. 2.3. Implementation At the time of writing, the development of NeuroScholar is at the prototyping stage. We have fully designed an object-relational Informix database structured to allow for
G.A.P.C. Burns / Neurocomputing 26}27 (1999) 963}970
967
ease of extensibility, so that more complex data can be incorporated into its structure without requiring new tables to be added. A trimmed-down version of this database permits the storage of references and citations, and provides a simple design environment for the development of a preliminary graphical user interface (GUI) for NeuroScholar. The GUI uses a web-browser (Netscape 4.05) as its windowing system and consists of HTML "les that are generated dynamically by a Javascript 1.2 program with GUI-to-database interactions being mediated by Java 1.1.6 applets that can be triggered from Javascript under the OpenConnect protocol [6]. The Javascript code is extensible to permit the generation and linking together of an arbitrary number of data tables. This will allow this prototype GUI to be extended for the full connectivity database. The use of client-side Javascript reduces the load to the server, and the GUI minimizes the number of queries acting on the database in order to promote high performance. It will be possible to save NeuroScholar data "les as HTML documents with embedded Javascript variables on the client machine. This will permit remote users to save data in the event of a system crash or of any network errors. The security of the system is of fundamental importance to the project, and all data will be tagged with the unique identi"er of the individual who entered them. 2.4. Data mining analysis NMDS was originally designed to represent the `similaritya between entities as spatial proximity between points in a multidimensional space. It was originally devised as a tool for visualization in pyschometrics [13], but since has been applied in a wide range of applications including the analysis of neuroanatomical connections [18]. NMDS o!ers a useful tool for visualization, and data exploration rather than precise and highly speci"c analysis (see Fig. 3). It is particularly useful for examining ordinal level data [10]. This method may produce variable outputs depending on the values chosen for analysis parameters. In order to represent this variability, we have devised a way of gathering statistics about the relative clustering of points within output con"gurations through the use of non-parametric cluster analysis. Methods of cluster analysis based on nonparametric density estimation can detect clusters of unequal size and dispersion or with highly irregular shapes (see SAS/STAT User's Guide Ch. 19, The MODECLUS procedure). We implement the MDS and MODECLUS commands from the SAS/STAT statistics software. We use PERL programs to automate the generation SAS input scripts, their execution and the subsequent interpretation of output "les to allow this approach to be incorporated into our overall analysis package. This analysis software is fully functional in its present form. We will design and build a graphical user interface for this software, so that given a connection matrix users can execute these analyses and display the results. The performance of these programs will allow NMDS visualizations to be generated in real time, and statistics can be generated over a longer period (so that users would request them and then receive them via e-mail).
968
G.A.P.C. Burns / Neurocomputing 26}27 (1999) 963}970
Fig. 3. NMDS con"guration of all areas receiving input from the retina, abbreviations from [16]. The retina lies in the center and the remaining structures in the con"guration may be categorized as shown.
3. Results The con"guration in Fig. 3 illustrates the use of this data mining technique with a con"guration derived from the network of connections between all structures that receive input from the retinal ganglion cells within the retina. In this analysis, the retina sits at the center of the con"guration and radiates connections to each and every area. The most interconnected structures in this con"guration are the brain structures that are often considered to be the early visual system [12], lying along the right-hand side of the con"guration. There are roughly three groups within this sector of the "gure. The superior colliculus and its subnuclei (SC, SCi, SCig, SCs, SCsg, SCop) lie close to the dorsal part of the lateral geniculate nucleus and the lateral posterior nucleus, at the top of the "gure. Approximately midway down the side of the "gure are the ventral part of the lateral geniculate nucleus and its lateral subnucleus (LGv, LGvl), the pretectum, the optic pretectal nucleus, the posterior pretectal nucleus (PRT, OP, PPT) and the laterodorsal nucleus of the thalamus (LD). At the bottom of the "gure lies a group made up of the parts of the accessory optic system (AOS, DT, LT, MT), and the nucleus of the optic tract (NOT). The top left-hand corner holds an interconnected set of structures made up of the lateral and medial preoptic nuclei (LPO, MPN), the medial preoptic area (MPO), the lateral hypothalamic area (LHA), the paraventricular nucleus of the hypothalamus (PVH), the bed nucleus of the stria terminalis (BST), the medial nucleus of the amygdala (MEA), the anterior
G.A.P.C. Burns / Neurocomputing 26}27 (1999) 963}970
969
hypothalamic area (AHA), and the anteroventral preoptic nucleus (AVP). Several structures lie at positions in between these two major groups. The intergeniculate lea#et (IGL), the external nucleus of the inferior colliculus (ICe) and the ventromedial part of the lateral geniculate nucleus (LGvm) lie to either side of the retina, equidistant from both groups. The suprachiasmatic nucleus (SCH), the retrochiasmatic nucleus (RCH), the subparavetricular zone (SBPV) and the olfactory tubercle (OT) lie between the retina and the group of structures containing the preoptic nuclei. Areas at the bottom of the "gure (AHNa, NDBh, AV, AD, PV, SO, SOperi, MA) have very sparse connectivity with the other structures in the system, and are constrained to lie close together in order to be as far from the other structures as possible. Fig. 3 shows how the organization of the system may be represented in a systematic, objective manner through the use of the methods described here.
4. Conclusions I have described a prototypical knowledge-based management system for neuroanatomical data called `Neuroscholara, and illustrated how such a system may be used to represent and interpret systems-level connectivity data. The system is being developed under the University of Southern California Brain Project (USCBP).
References [1] F. Bloom, Neuroscience-knowledge management: slow change so far, Trends Neurosci. 18 (1995) 9}48. [2] B. Blum, Clinical Information Systems, Springer, New York, 1986. [3] G.A.P.C. Burns, Neural connectivity of the rat: Theory, methods and applications, D.Phil. Thesis, Physiology Department, Oxford University, 1997. [4] A. Dashti, S. Ghandeharizadeh, J. Stone, L. Swanson, R. Thompson, Database challenges and solutions in neuroscienti"c applications, Neuroimage 5 (1997) 97}115. [5] D.J. Felleman, D.C. Van Essen, Distributed hierarchical processing in the primate cerebral cortex, Cerebral Cortex 1 (1991) 1}47. [6] D. Flanagan, Javascript, the de"nitive guide, O'Reilly and Associates, Sebastopol, 1997. [7] C.R. Gerfen, P.E. Sawchenko, An anterograde neuroanatomical tracing method that shows detailed morphology of neurons, their axons and terminals } immunohistochemical localization of an axonally transported plant lectin, Phaseolus-vulgaris leukoagglutinin (PHA-L), Brain Res. 290 (1984) 219}238. [8] S. Gorn, Informatics (computer and information science): Its ideology, methodology, and sociology, in: F. Machlup, U. Mans"eld (Eds.), The study of information: Interdisciplinary messages, Wiley, New York, 1983, pp. 121}140. [9] L. Heimer, M.J. Robards, Neuroanatomical Tract-tracing Techniques, Plenum Press, New York, 1981. [10] J.B. Kruskal, Multidimensional scaling by optimizing goodness of "t to a Nonmetric Hypothesis, Psychometrika 29 (1964) 1}27. [11] J.W. Scannell, C. Blakemore, M.P. Young, Analysis of connectivity in the cat cerebral cortex, J. Neurosci. 15 (1995) 1463}1483. [12] A.J. Sefton, B. Dreher, Visual System, in: G. Paxinos (Ed.), The Rat Nervous System, Academic Press, Sydney, 1995, pp. 833}898.
970
G.A.P.C. Burns / Neurocomputing 26}27 (1999) 963}970
[13] R.N. Shepard, The analysis of proximities: multidimensional scaling with an unknown distance function, I, Psychometrika 27 (1962) 219}246. [14] K. E. Stephan, R. KoK tter, Objective Relational Transformation (ORT): A new foundation for connectivity databases, 1999, this volume. [15] L.W. Swanson, Biochemical switching in hypothalamic circuits mediating responses to stress, Prog. Brain Res. 87 (1991) 181}200. [16] L. W. Swanson, Brain Maps: Structure of the Rat Brain, second edition., Elsevier, Amsterdam, 1998. [17] G. van Heijst, A. Schrieber, B. Wielinga, Using Explicit Ontologies in KBS development, Int. J. Human-Computer Studies 45 (1997) 183}292. [18] M.P. Young, Objective analysis of the topological organization of the primate cortical visual system, Nature 358 (1992) 152}155. [19] M.P. Young, The organization of neural systems in the primate cerebral cortex, Proc. R. Soc. London Ser. B 252 (1993) 8}13.
Gully Burns graduated from Imperial College, London with a "rst class Bachelor of Science degree in Physics. He started his D.Phil. at the department of Physiology in Oxford in 1992 under Malcolm P. Young. He moved to NewcastleUpon-Tyne when Dr Young took the chair of the psychology department there and completed the work for his D.Phil. there. In 1997 he moved to the University of Southern California to work as a postdoctoral research fellow in the group of Dr Larry Swanson funded by the USC Brain Project. His research is concerned with understanding the large-scale organization of the brain by analyzing patterns of connections between brain structures. This involves experimental work in the "eld of chemical neuroanatomy, and theoretical research into databases and data-mining in order to be able to quantify, organize and then analyze data describing the neuronal circuitry in a mathematically tractable way.