A graphic editor for analyzing signal-transduction pathways

A graphic editor for analyzing signal-transduction pathways

Gene 259 (2000) 235–244 www.elsevier.com/locate/gene A graphic editor for analyzing signal-transduction pathways Tomohiro Koike a,b, Andrey Rzhetsky ...

2MB Sizes 1 Downloads 31 Views

Gene 259 (2000) 235–244 www.elsevier.com/locate/gene

A graphic editor for analyzing signal-transduction pathways Tomohiro Koike a,b, Andrey Rzhetsky a,c, * a Columbia Genome Center, Columbia University, New York, NY, USA b Hitachi Software Engineering Co., Ltd., Yokohama, Japan c Department of Medical Informatics, Columbia University, New York, NY, USA Received 24 April 2000; received in revised form 29 August 2000 Received by T. Gojobori

Abstract We describe a graphical editor designed specifically to facilitate analysis and visualization of complex signal-transduction pathways. The editor provides automatic layout of complex regulatory graphs and enables users easily to maintain, edit, and exchange publication-quality images of regulatory networks. © 2000 Elsevier Science B.V. All rights reserved. Keywords: Bioinformatics; Java application; Public-domain software; Regulatory pathway; Visualization

1. Introduction Until recently, studies in molecular biology have focused largely on the elucidation of the function and structure of individual genes and of gene families. In contrast, today’s research emphasizes understanding the mechanisms of interactions between ensembles of genes and proteins within regulatory networks. Historically, computational analysis of molecular networks began with the study of biochemical pathways in bacteria. Therefore, it is not surprising that the existing tools for analysis and visualization of molecular interactions are fine-tuned to biochemical pathways in prokaryotes ( Karp, 1991, 1992, 1996a,b; Karp and Paley, 1994, 1996; Karp et al., 1996, 1997, 1999; Salamonsen et al., 1999), rather than to signal-transduction pathways in eukaryotes. Unlike metabolic pathways in bacteria, signal-transduction pathways have an inherent duality Abbreviations: 2D, two-dimensional; 3D, three-dimensional; API, Application Programming Interface; BMP, bitmap file format; CUtenet, Columbia University True Editor for NETworks; GA, genetic algorithms; JDK, Java development toolkit; JPEG, Joint Photographic Experts Group file format; NP-hard, problems that are harder than problems that can be solved by non-deterministic Turing machine in polynomial time; PICT, a bitmap format used on the Macintosh; PNG, Portable Network Graphics file format; RMI, Remote Method Invocation; SGI, Silicon Graphics Incorporated; VRML, Virtual Reality Modeling Language; WWW, world-wide web. * Corresponding author. Tel.: +1-212-304-7552; fax: +1-212-304-5515. E-mail address: [email protected] (A. Rzhetsky)

in description: interactions between molecules can be described at two independent levels, biochemical (e.g. phosphorylation) and logical (e.g. activation). Similarly, states of most substances in signal-transduction pathways can be characterized both at logical (active or inactive) and at biochemical (phosphorylated, dephosphorylated, methylated, demethylated, etc.) levels. Although an ideal description of a signal-transduction pathway encompasses both the logical and biochemical levels for each regulatory event and substance, knowledge of the actual pathways is invariably incomplete and some events and substances are characterized on only either a logical or biochemical level (Rzhetsky et al., 2000). Further, unlike bacteria, multicellular organisms have multiple tissue types, complex ontogenetic development, and a sophisticated subcellular structure. All these additional pieces of information must be specified for each regulatory event, which represents a ‘link’ in a complex network. In this article, we describe a Java application, CUtenet (Columbia University True Editor for NETworks), that we designed to facilitate the representation, visualization and analysis of signal-transduction pathways. The primary applications of CUtenet are reading and writing molecular-network data from and to text files, editing network representations, and visualizing complex networks in two or three dimensions. One of the most demanding components of the type of network visualization is performing automatic layout of large directed graphs, with the aim of creating

0378-1119/00/$ - see front matter © 2000 Elsevier Science B.V. All rights reserved. PII: S0 3 7 8 -1 1 1 9 ( 0 0 ) 0 0 45 8 - 3

236

T. Koike, A. Rzhetsky / Gene 259 (2000) 235–244

aesthetically pleasing and scientifically significant representations.

2. Materials and methods We used the Java programming language because it is highly portable across various computer platforms and has carefully designed graphical-user-interface libraries. We used the Sun Java3D package for rendering threedimensional images of network representations. (Java3D is a part of Java Media Framework APIs, which is a standard extension of Java. Currently, it is implemented for the Microsoft Windows9x/NT, Sun Solaris, Hewlett Packard HP-UX, Linux, and SGI IRIX platforms.) Since selection of the optimal graph layout is computation intensive, we decided to enable our layout optimizing module to be run as a distributed system. In other words, we wanted CUtenet to be able to delegate graph optimizing jobs to remote servers on a computer network. To realize this functionality, we used Java RMI (Remote Method Invocation), which is part of the core library of Java. Our main development machine has an Intel Pentium III processor with 450 MHz clock cycle, augmented with a 3D accelerating video card; the operating system is Windows NT 4.0 Workstation. For editing, compiling, and debugging the Java code, we used JBuilder 3.0 (Inprise Corporation). 2.1. Major functions of the program 2.1.1. Editing of pathway data The most important function of CUtenet is the editing of pathway representations: CUtenet can define new substances and the interactions among them, can edit existing substances and interactions, and can delete obsolete parts of a pathway. The current version of CUtenet uses text files for storing regulatory network information and allows users to create and modify data directly using standard text editors or scripts written in Perl, awk, or other scripting languages. 2.1.2. Three-dimensional graphics editor We chose to use a three-dimensional representation of pathway structure for two reasons: (1) three-dimensional graphics allow high-quality rendering of pathway representations for publications; and (2) representations of complicated pathways may require the third dimension to permit clear visualization of all interactions. CUtenet’s three-dimensional graphic editor can allocate basic three-dimensional objects in virtual space and can render them as a picture. Each graphical object corresponds to either a substance (gene, protein, small molecule) or an interaction between substances; each has attributes, such as color, size, and name.

A user can change the location of these substances manually. In addition, the location and size of action objects are adjusted automatically such that substances are connected properly. Currently, the program can save three-dimensional scenes in three-dimensional graphics data formats (such as VRML) and render images in two-dimensional graphics formats (such as JPEG). 2.1.3. Two-dimensional graphic editor Because rendering a scene with numerous threedimensional objects requires intensive computation, CUtenet runs relatively slowly on current desktop computers. Therefore, we implemented a two-dimensional graphics editor that allows for significantly faster operation with larger data sets; except for the handling of the third dimension, its functionality is identical to that of the three-dimensional editor. A user can switch between two- and three-dimensional editing at will, using the two-dimensional editor for quick drafting of the pathway and the three-dimensional editor for final publication-quality rendering. 2.2. Design of the program 2.2.1. Overall structure CUtenet consists of seven major modules (Fig. 1): (1) graphical user interface (GUI ), (2) entity, (3) core, (4) representation, (5) three-dimensional representation, (6) two-dimensional representation, and (6) layout optimizer. The GUI module comprises a set of tools for facilitating interaction between the program and the user; the module passes messages from the user to other program modules. The entity module handles the internal representation of regulatory pathways. Data structures (objects) implemented in this component are based on an ontology model (Rzhetsky et al., 2000) designed specifically for analysis of signal-transduction pathways. The representation module contains visualization-specific data, such as color, location, and size for every object, and implements algorithms to construct

Fig. 1. Module diagram of CUtenet. Arrows represent dependencies between modules.

T. Koike, A. Rzhetsky / Gene 259 (2000) 235–244

two- and three-dimensional scenes for an arbitrarily oriented graph. The three-dimensional representation module contains all the tools necessary to build threedimensional virtual worlds and to render high-quality two-dimensional projections of these worlds. The twodimensional representation module is a two-dimensional replica of, and is completely consistent with, the threedimensional representation module. The layout optimizer module permits the user to optimize the appearance of a graph corresponding to a pathway representation. Because we intend to use this module on remote hosts, we designed it to be used as a standalone server. Finally, the core module is the central module of CUtenet that incorporates the remaining functions and manages interactions among other modules. 2.2.2. Data components Data structures in the program follow the modularity of the program architecture. The major distinct categories are entities (which belong to the entity module), instances (which belong to the representation module), and attributes (which belong to the representation module). The two- and three-dimensional representation modules are also associated with their own data components. Entities represent proteins, genes, other molecules, and their complexes that appear in the pathway, as well as effects, such as osmotic shock or radiation. Whereas entities describe general properties of a type of molecules, instances represent the visual objects that correspond to entities. Moreover, a single entity can have multiple representations in a pathway, reflecting, for example, different subcellular localizations of the same substance. Every instance has a unique ID and points to an entity (more than one instance may reference a single entity). Attributes are properties of graphical objects, such as the shape, position, and color, that correspond to entity instances; each instance is associated (by a reference) with a unique attribute set. Note that attributes are maintained and processed separately from entities and actions between entities, which are the ‘backbone’ of these pathways.

3. Results and discussion 3.1. Portability CUtenet is almost platform independent because it is implemented in the Java programming language. It is not fully platform independent because the current implementation of our three-dimensional representation module uses the Java3D library, which is currently platform dependent. However, Java3D is available on most of the major platforms, as mentioned in Materials and methods and we expect CUtenet to run on these platforms with little or no modification.

237

The layout optimizer module can be run independently from CUtenet. It can run as a standalone application or as a server that accepts requests from remote hosts. It is a pure Java program compatible with JDK 1.1, and has been tested as a remote calculation server on IRIX, Solaris, and Windows NT. 3.2. Graphic editors 3.2.1. Object attributes As previously discussed, every graphical object in the image has visual attributes: shape (sphere, cylinder, cone, box, and others), color (up to 16 million colors, depending on platform environment), location, size, and rotation. The user can edit these attributes directly in a special text window generated by the editor on request. The user can also modify location, scaling factor, and rotation directly by dragging objects with the computer mouse. Caption is a special attribute of a graphical object that corresponds to the object name. Captions belong to their master objects but are implemented as independent graphical objects. The user can change attributes of captions in exactly the same way as attributes of normal objects, but the location of each caption is specified as a relative position with respect to its master object. 3.2.2. Image attributes We defined a number of attributes that characterize the pathway image itself: a user can change the physical size of the image, the scale (how largely objects are drawn), the background color, and the background image. In addition to these attributes, which both the two- and three-dimensional editors use, the three-dimensional editor provides a camera. The camera is located in the three-dimensional virtual world and its location, direction, and angle directly affect the image. Thus, the user can move the camera and directly alter the viewpoint. 3.2.3. Handling of special entities In our ontology, a concept called complex has multiple components, each of which can be itself a complex or a substance. Note that an effect (such as radiation or osmotic shock) is not allowed to be a component of a complex for obvious reasons. Graphically, a complex is represented by a small rectangle (in the two-dimensional editor) or cube (in the three-dimensional editor), that corresponds to the center of the complex, and each complex component is represented by a sphere or circle grouped around the complex center. The user can toggle the visibility of the complex center and can move all components of the complex simultaneously by dragging the complex center with a mouse. The user can also move individual complex components separately. CUtenet handles the graph edges (actions) as special objects. These

238

T. Koike, A. Rzhetsky / Gene 259 (2000) 235–244

actions are always drawn as arrows. A regular arrowhead corresponds to an action that results in activation of a substance downstream. A flat hammerlike arrowhead represents an inactivating or inhibiting action. An edge that has no arrowhead represents an action whose results are unknown. Action locations, rotation, and lengths are calculated automatically according to the set locations of corresponding substances; other action attributes can be changed directly by the user. 3.2.4. Saving of graphic attributes CUtenet allows the user to save all graphic-object attributes in a text file, so that she can recover all revisions at the next editing session. 3.2.5. Saving of VRML CUtenet allows users to save three-dimensional graphics in VRML ( Virtual Reality Modeling Language) format, which is widely used and is supported by many programs. There are numerous commercial and public-domain VRML viewer programs that allow users to see a VRML scene from various angles or even to walk though the scene to see particular parts of it. VRML is designed for use on computer networks, and popular web browsers have plug-in viewer for VRML. Users can easily publish CUtenet-generated three-dimensional graphics on their web sites. 3.2.6. Saving of images CUtenet can save graphics in more than 10 formats, including BMP, JPEG, PNG, PICT, and PostScript (without compression). In doing so, it can set the desirable size and scale of the image. To create highresolution images, CUtenet draws off-screen an image of desired size and resolution, rather than capturing and scaling the screen version of the image. 3.2.7. Switching between the two- and three-dimensional editors CUtenet does not allow the user to run the two- and three-dimensional editors concurrently, to avoid conflicts between two data representations and to reduce the random-access-memory use by the editor. Instead, the user can switch between the editors, editing the pathway in the two-dimensional editor and then rendering it in the three-dimensional editor. When she makes this switch, nearly all graphical parameters are adjusted automatically ( Figs. 2 and 3). The only exception is a set of camera-related parameters and z coordinates related to location and rotation — attributes that are not available in the two-dimensional editor. 3.3. Layout optimizer There is no (known) efficient algorithm that draws an arbitrary graph with oriented edges while finding the global minimum of edge intersections (Garey and Johnson,

1979, 1983; Di Battista et al., 1999). There is also no efficient way of testing whether a particular oriented graph can be drawn on a flat surface such that no edges are intersecting (Garey and Johnson, 1979, 1983). More precisely formulated, that both problems are NP-hard, or are essentially intractable for large graphs. Although it may not be possible to find the exact solution for large data sets, an approximate solution for many NP-hard problems can be found with reasonable speed (e.g. see Garey and Johnson, 1979, 1983). A popular way to find an approximate solution to an NP-hard problem is to use the simulated annealing method (Metropolis et al., 1953; Kirkpatrick et al., 1983). Davidson and Harel applied this technique to graphs with non-oriented edges (Davidson and Harel, 1996). We extended their techniques to oriented graphs such that edges oriented up or left were penalized. In addition, we required that every angle between a pair of edges, adjacent to a node of degree 2 (node with two edges), strive to get as close to 180° as possible. In simulated annealing, each graph drawing is assigned an energy value: disordered drawings usually have higher energy, whereas ordered drawings tend to have lower energy. The energy of a graph is calculated as an objective function of Euclidean distances between nodes of the graph, the number of edge intersections, the direction of the edges in the graph, and the symmetry of the drawing. The system starts with a random drawing and proceeds by stochastically generating displacements of nodes. At the onset, the temperature of the system is high, resulting on average in large displacements of nodes during each random step. Toward the end of the computation, the temperature gradually drops, leading to a decrease of the mean node-displacement distance. Each new state is accepted if the corresponding energy is lower than the energy of the previous step. If the new graph’s energy is higher than the previous graph’s, the new graph is accepted with a probability that is inversely dependent on the current temperature and on the difference between the energies of the two graphs. More specifically, our algorithm for calculating the energy corresponding to each pathway drawing incorporates these seven components. 1. Energies contributed by pairwise attraction and repelling forces between substances; every pair of substances was assumed to experience repelling force inversely proportional to the distance between the substances. Two substances connected by an edge (action) were considered to be attracted to one another with a force inversely proportional to the length of the edge. Balance of these two forces for graph vertices connected with an edge determined the mean length of the average graph edge. 2. Energies contributed by the repelling forces between substance/edge pairs. 3. Energy values added as ‘penalties’ for edge crossings.

T. Koike, A. Rzhetsky / Gene 259 (2000) 235–244

COLOUR FIGURE Fig. 2. Two-dimensional graphics view of human cell-cycle control and programmed cell-death systems pathways. The red and blue parts of the figure indicate pro-life (cell-cycle) and pro-death (apoptosis) pathways, respectively. The pathways were compiled from databases KEGG ( Kyoto University), SPAD at http://www.grt.kyushu-u.ac.jp/spad/index.html, mammalian MAPK signaling pathways http://kinase.oci.utoronto.ca/signallingmap.html, the Virtual Library of Cell Biology at http://vl.bwh.harvard.edu/signal_transduction.shtml, and from analysis of many review articles.

239

240

T. Koike, A. Rzhetsky / Gene 259 (2000) 235–244

Fig. 3. Three-dimensional graphics window.

4. Energies ‘penalizing’ a substance for being too close to the figure bounds. 5. Energies ‘punishing’ substance or edges for trespassing ‘restricted’ areas of the figure (such as outline of a mitochondrion for non-mitochondrial proteins). 6. Energies added as penalties for the incorrect orientation of edges: typically easy-to-read pathways flow from top to bottom and from left to right. 7. Energy penalties as a function of a ‘bad’ angle between two edges incident to the same vertex of degree (degree of a vertex is just the number of edges incident to this vertex) two: the closer an angle is to 180°, the less the penalty. In addition, our algorithm allows for fixed nodes, which are counted on energy calculations but are not moved in optimization cycles, so that it can keep particular objects fixed. For example, membrane proteins may be fixed, so that they remain on the cell membrane drawn on a background image.

Currently, our optimizer produces layouts for relatively small graphs successfully and quickly (Fig. 4). For realistic (bigger) regulatory graphs of 50 nodes, the current version of the optimizer can run 1000 cycles of optimization in 20 to 30 s on our development machine and can produce pleasing graphs (Fig. 5). However, when tested with a large data set (more than 2100 nodes and 2200 edges), 400 cycles takes about 7 h and the graph is not optimized sufficiently well (Fig. 6). We are working to speed up computation for larger graphs. We have tested the optimizer successfully as a remote calculation server on Windows NT, IRIX, and Solaris workstations. 3.4. Layered drawing optimizer In a layered drawing of an oriented graph, most of the edges are oriented from top to bottom, and all vertices are situated at horizontal levels (see Di Battista

T. Koike, A. Rzhetsky / Gene 259 (2000) 235–244

241

Fig. 4. Example of output of our graph layout module for small graphs. All nodes were gathered at the center of the drawing area at the beginning of optimization.

et al., 1999, p. 256 for a review of related algorithms). This way of representing pathways is particularly important for our application because it mimics the handcrafted layouts of pathways that appear in research articles written by specialists in signal transduction. This layout simplifies visual identification of upstream and downstream genes in the plot — a task that can be challenging given a complex pathway without a layer structure (see Fig. 6). Our program produces a layered graph in three separate steps. First, in layer assignment, vertices are assigned to separate levels. Second, in horizontal-coordinate assignment, nodes are arranged within each horizontal level, with the aim of reducing the number of edge intersections while ignoring arrangement of node labels. Third, in label placement, the nodes are moved horizontally without the number of edge intersections being changed, so that there are no overlapping labels. There are several published deterministic polynomialtime algorithms (see Di Battista et al., 1999 for a review), each fine-tuned to a specific set of optimization constraints for each of these three steps. We did not take advantage of these specialized heuristic algorithms, implementing instead more general simulated annealing and spring algorithms. Although these algorithms take more computational resources, they allow greater flexibility in introducing new optimization constraints.

We discuss each step next. Layer assignment. Similar to the approach of Davidson and Harel (1996), our algorithm tried to minimize an energy function reduced to a sum of a few simple penalties. We designed the first penalty to keep nodes without incoming edges as close as possible to the top (zero) layer. That is, for every node with no incoming edges, the energy penalty for not being at the zero layer was proportional to the squared number of layers from the zero level. The second penalty was associated with edge orientation: each edge directed toward the top of the drawing was penalized 1 unit of energy. The third penalty was associated with the length of the edges (only the y coordinate of the edge was taken into account at this stage). The penalty was computed as a square of the difference of the edge y coordinate and an ideal edge length (1 in our setup). The annealing was implemented in exactly the same way as described earlier in this paper, except that only one coordinate was changed, all newly proposed y coordinates were non-negative integers, completely horizontal edges were disallowed, and the total number of nodes at each level was limited by a predefined maximum. This step of the layout algorithm is fast (a few seconds on an Intel Pentium III 800 MHz processor) for a relatively large graph (253 vertices and approximately the same number of edges; see Fig. 2).

242

T. Koike, A. Rzhetsky / Gene 259 (2000) 235–244

Fig. 5. Examples of output of the graph layout module for small graphs. (1, 2) A simple oriented cycle laid out with up or left edge orientation penalized (1), and without the edge-orientation constraint (2). (3) Simulated annealing layout for the pathway shown in Fig. 3. Here, coordinates of membrane-bound proteins were fixed [Bcl-XL and BAD are bound to the outer mitochondrial membrane (mitochondrion shown here as a hexagon), and IL-3R, IGF1R, FAS, and an unknown receptor reside on the outer cellular membrane]; edge intersections and incorrect edge orientation ( left and up) were discouraged. The resulting graph has no edge intersections. (4) Layout for the same graph with all node coordinates allowed to change. The graph has no edges crossing each other, and only two edges oriented left or up rather than right or down. (5) Layout for the same graph performed with the edge-orientation constraint relaxed; the graph has no edge intersections.

Horizontal-coordinate assignment. At this stage, we again used an energy minimization approach with two different algorithms — a simulated annealing and a spring algorithm (which we will describe later). At this step, the y coordinate of nodes was fixed, whereas we perturbed the x coordinates to minimize an energy function. As before, the energy function was a sum of terms defining individual constraints. The energy for a pair of nodes at the same layer was defined as a square of difference between the ‘ideal distance’ (a preset parameter) and the absolute difference between the horizontal coordinates of the two vertices. The penalty for the edge crossing was set to 1 unit of energy for each crossing. The energy penalty associated with the length of the graph edges was computed as a sum of squares of the differences between x coordinates of the beginning and the end of an edge. The energy characterizing interactions between each edge and each node were computed as a square of the difference between a preset threshold value and the actual distance between them. The rest of the calculation was just the standard recipe for simulated annealing. Since the energy functions at this stage are more computationally expensive than at the previous

stage, only 100 optimization cycles took 4 to 7 min on the same processor and for the same graph with 253 nodes. Label placement. Before placing labels, we scaled (enlarged ) along the x axis the graph that was optimized before without consideration of the label overlap, such that all labels became non-overlapping, sometimes at the expense of large empty spaces within layers. We then optimized this mutilated graph again with simulating annealing or with a spring algorithm, this time taking into account the energy of label interactions. The setup of the previous step was reproduced completely except that, in this case, the penalty for an overlap between a pair of labels was not equal to zero, but rather was equal to the sixth degree of the two-label overlap along the x axis. The setup of the spring algorithm differs from the setup for the annealing optimization in only one detail: instead of computing a set of energies, we compute a set of forces (although using exactly the same formulae that we used for calculating energies). Unlike energies, forces had directions, but only the x coordinate of each force was considered in optimization. For each node,

T. Koike, A. Rzhetsky / Gene 259 (2000) 235–244

243

Fig. 6. Example of an unsatisfactory output of the non-layered graph layout module for a large regulatory network.

the algorithm determined the ideal x coordinate, where the node would achieve the balance of all forces acting on it. The process was repeated for all nodes until the coordinates of all nodes ceased to change. The spring algorithm provided the same result as about 1000 cycles of simulated annealing, but at the computational cost of only 50 to 100 cycles (the same 4 to 7 min for the same 253 vertex graph). The result of the optimization is shown in Fig. 2. The figure shows a compilation of human regulatory pathways governing cell differentiation, cycle control, and programmed death. Although the regulatory scheme is far from being complete, it is far more complex than any regulatory scheme published in any review or research article that we found — manual layout of the same pathway would be a challenging task. Note that the real pathway is significantly different from a planar graph (a graph that can be plotted on a surface in such a way that no two edges intersect). 3.5. Immediate plans 3.5.1. Optimization of initial parameter set Determination of the initial coordinates of substances significantly affects computation time. We are imple-

menting and testing techniques for smart initial-coordinate selection; they promise to save significant time in optimization. Another important problem with simulated annealing is finding the optimum parameter values for the fastest and most aesthetically pleasing graph. One attractive way to choose good parameter values is to use genetic algorithms. The genetic algorithm (GA) ( Holland, 1975; Goldberg, 1989; Koza, 1992; Fogel, 1995; Ba¨ck, 1996; Mitchell, 1996) is another popular optimization approach that uses a pseudorandom process. One of the features of this algorithm allows for the manual selection of results of graph drawing, while the parameters of simulated annealing are stored in ‘genomes’ of artificial creatures undergoing ‘unnatural selection’. The continuum of admissible values for each parameter is divided into a given number of intervals — for example, 64 — and these intervals are labeled with integer numbers between, say, 0 and 63. The number of intervals in this method is always discrete, but it can be made large. The genome of each of the simulated creatures is simply a sequence of binary encoded parameter values. At the beginning of a GA run, a population of random genomes

244

T. Koike, A. Rzhetsky / Gene 259 (2000) 235–244

is generated. At each epoch or generation, the genomes are allowed to replicate, to recombine (to undergo ‘crossovers’ at arbitrary points of the two binary sequences), and to mutate. The size of the population is usually maintained constant, while the least fit genomes are removed from population at each generation according to the subjective beauty of the graph drawing produced with corresponding parameter values. Several generations of such selection should lead to a set of useful parameter values that can be used in the visualization program from then on. In summary, since simulated annealing is governed by several parameters that dramatically affect the results and speed of computation, we will use a manual selection version of a genetic algorithm — for example, that by Holland or Goldberg (Holland, 1975; Goldberg, 1989) — to breed those sets of simulated annealing parameter values that produce the most aesthetically pleasing drawings.

4. Web site Additional information about CUtenet is available at http://genome6.cpmc.columbia.edu/~tkoike/cutenet/.

Acknowledgements This study was partially supported by grants from the Center for Advanced Technology, New York State, and from Hitachi Software Engineering Co., Ltd. All product and company names mentioned in this article may be registered trademarks. CUtenet uses the following libraries developed by third parties. Java3D is one of the Standard Extension packages of Java, which provides a high level three-dimensional graphics API. The Windows NT version is distributed free of charge by SUN Microsystems. For more information, please see http://java.sun.com/products/javamedia/3D/index.html. CyberVRML97 for Java is a development package for VRML97/2.0 and Java3D. It is kindly distributed without charge by Mr. Satoshi Konno. For more information, please see http://www.cyber.koganei.tokyo.jp/ vrml/cv97/cv97java/index.html. The JIMI Software Development Kit is a library for managing images. It is kindly distributed without charge

by SUN Microsystems. For more information, please see http://java.sun.com/products/jimi/.

References Ba¨ck, T., 1996. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, New York. Davidson, R., Harel, D., 1996. Drawing graphs nicely using simulated annealing. ACM Trans. Graph. 15, 301–321. Di Battista, G., Eades, P., Tamassia, R., Tollis, I.G., 1999. Graph Drawing. Algorithms for the Visualization of Graphs. Prentice Hall, Upper Saddle River, NJ. Fogel, D.B., 1995. Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. IEEE Press, Piscataway, NJ. Garey, M.R., Johnson, D.S., 1979. Computers and Intractability: A Guide to the Theory of NP-completeness. W.H. Freeman, New York. Garey, M.R., Johnson, D.S., 1983. Crossing number is NP-complete. SIAM J. Alg. Discr. Meth. 4, 312–316. Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Menlo Park, CA. Holland, J.H., 1975. Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor, MI. Karp, P.D., 1991. Artificial intelligence methods for theory representation and hypothesis formation. Comput. Appl. Biosci. 7, 301–308. Karp, P.D., 1992. A knowledge base of the chemical compounds of intermediary metabolism. Comput. Appl. Biosci. 8, 347–357. Karp, P.D., 1996a. Database links are a foundation for interoperability. Trends Biotechnol. 14, 273–279. Karp, P.D., 1996b. A protocol for maintaining multidatabase referential integrity. Pac. Symp. Biocomput., 438–445. Karp, P.D., Paley, S.M., 1994. Representations of metabolic knowledge: pathways. Ismb 2, 203–211. Karp, P.D., Paley, S., 1996. Integrated access to metabolic and genomic data. J. Comput. Biol. 3, 191–212. Karp, P.D., Riley, M., Paley, S.M., Pelligrini-Toole, A., 1996. EcoCyc: encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res. 24, 32–39. Karp, P.D., Riley, M., Paley, S.M., Pellegrini-Toole, A., Krummenacker, M., 1997. EcoCyc: enyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res. 25, 43–51. Karp, P.D., Riley, M., Paley, S.M., Pellegrini-Toole, A., Krummenacker, M., 1999. EcoCyc: encyclopedia of Escherichia coli genes and metabolism. Nucleic Acids Res. 27, 55–58. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P., 1983. Optimization by simulated annealing. Science 220, 671–680. Koza, J.R., 1992. Genetic Programming. MIT Press, Cambridge, MA. Metropolis, S.C., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., 1953. Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092. Mitchell, M., 1996. An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA. Rzhetsky, A.R., Koike, T., Kalachikov, S., Gomez, S.M., Krauthammer, M., Kaplan, S.H., Kra, P., Russo, J.J., Friedman, C., 2000. A knowledge model for analysis and simulation of regulatory networks. Bioinformatics. in press Salamonsen, W., Mok, K.Y., Kolatkar, P., Subbiah, S., 1999. BioJAKE: a tool for the creation, visualization and manipulation of metabolic pathways. Pac. Symp. Biocomput., 392–400.