Journal of Visual Languages and Computing (2000) 11, 323}343 doi:10.1006/jvlc.2000.0160 available online at http://www.idealibrary.com on
Image Indexing and Retrieval Using Object-based Point Feature Maps YI TAO
AND
W. I. GROSKY*
Computer Science Department, Wayne State University, Detroit, MI 48202, U.S.A., + yit, grosky,@cs.wayne.edu Accepted 18 January 2000 Multimedia data such as audios, images, and videos are semantically richer than standard alphanumeric data. Because of the nature of images as combinations of objects, content-based image retrieval should allow users to query by image objects with finer granularity than a whole image. In this paper, we address a web-based object-based image retrieval (OBIR) system . Its prototype implementation particularly explores image indexing and retrieval using object-based point feature maps. An important contribution of this work is its ability to allow a user to easily incorporate both low- and high-level semantics into an image query. This is accomplished through the inclusion of the spatial distribution of point-based image object features, the spatial distribution of the image objects themselves, and image object class identifiers. We introduce a generic image model, give our ideas on how to represent the low- and high-level semantics of an image object, discuss our notion of image object similarity, and define four types of image queries supported by the OBIR system. We also propose an application of our approach to neurological surgery training.
2000 Academic Press
1. Introduction MULTIMEDIA INFORMATION SYSTEMS may be viewed as storage and retrieval systems where large volumes of multimedia data such as audios, images, and videos are created, indexed, modified, searched, and retrieved. In the last few years, content-based image retrieval has seen a great deal of emphasis in the context of multimedia information systems [1}3]. Images can be associated with both low-level semantics (color, texture, shape, and various spatial constraints) and high-level semantics (correspondence between image objects and real-world objects). In order to deal with these rich semantics of images and image sequences, it is necessary to move from image- to object-level interpretation. Color has been used in many previous approaches [1, 4]. Most of these studies use global color information, but some have represented finer details of the color distribution [5, 6]. Compared with such visual information as color, texture, and spatial * We would like to acknowledge the support of NSF grant 97-29818.
1045-926X/00/060323#21 $35.00/0
2000 Academic Press
324
Y. TAO AND W. I. GROSKY
constraints, shape is such an important feature associated with image objects of interest that it becomes an essential part of the way we interpret and interact with the real world. Shape alone may be sufficient to identify and classify an object completely and accurately. Shape, along with the various spatial constraints of multiple objects, is important data in many applications, ranging from complex space exploration and satellite information management to medical research and entertainment. In general, object shape is determined by the context and the observer. It has been noted [7], however, that shape information is highly resolution dependent, and requires elaborate processing to extract from an image. Shape cues include only a restricted set of view invariants such as corners and zeros of curvature. Extraction and representation of object shape are relatively difficult tasks and have been approached in a variety of ways. In Mehtre et al. [8], shape representation techniques are broadly divided into two categories: boundary-based and region-based. One drawback of this categorization, however, is that they put shape attributes such as area, elongation, and compactness into both categories. We view shape representation techniques as being in two distinct categories: measurement-based methods ranging from simple, primitive measures such as area and circularity [1] to the more sophisticated measures of various moment invariants [1, 8]; and transformation-based methods ranging from functional transformations such as Fourier descriptors [8] to structural transformations such as chain codes [9] and curvature scale space feature vectors [10]. An attempt to compare the various shape representation schemes is made in Mehtre et al. [8]. One of the earliest proposals for representing spatial relationships of constituent image objects is to encode spatial information by two-dimensional (2D) strings [11]. Each image is considered as a matrix of symbols, where each symbol corresponds to an image object. The corresponding 2D-string is obtained by symbolic projection of these symbols along the horizontal and vertical axes, preserving the relative positions of the image objects. Spatial matching then becomes an instance of the longest common substring problem. In order to improve the performance of this technique, some 2D-string variants have been proposed, such as the extended 2D-string [12], 2D C-string [13], and 2D C>-string [14]. Additionally, there are spatial representation methods such as the geometry-based hR-string approach [15], the spatial orientation graph approach [16], and the quadtree-based spatial arrangements of feature points approach [17]. As of yet, no definitive comparisons of these methods have been made. Humans are much better than computers at extracting semantic information from images. We believe that complete image understanding should start from interpreting image objects and their relationships. However, this goal is still beyond the reach of the state-of-the-art in computer vision. As of now, major approaches are still based on manual annotations of semantic information related to the objects of concern. The semantic information associated with image objects can be represented as a class object or embedded inside a class [18]. For example, some simple semantic concepts like generalization and specialization are part of the concept of a class. In this paper, we address a web-based system called object-based image retrieval (OBIR ). Its prototype implementation explores image indexing and retrieval using object-based point feature maps. An important contribution of this work is its ability to allow a user to easily incorporate both low- and high-level semantics into an image query. This is accomplished through the inclusion of the spatial distribution of point-based image object features, the spatial distribution of the image themselves, and image object class identifiers.
IMAGE INDEXING AND RETRIEVAL
325
The remainder of this paper is organized as follows. In the next section, we introduce a generic image model, and then give our ideas on how to represent the low- and high-level semantics of an image object. Section 3 discusses our notion of image object similarity and defines the four types of image queries supported by the OBIR system. The prototype implementation of the OBIR system, a test database, and some sample image queries are presented in Section 4. Section 5 proposes an application of our approach to neurological surgery training. Finally, we give some concluding remarks.
2. An Image Model An image object is either an entire image or some other meaningful portion (consisting of a union of one or more disjoint regions) of an image. Typically, an image object would be a semcon [19]. For example, consider an image of a bucolic scene consisting of a horse, a cow, and a chicken in a meadow, with the sky overhead and green grass below. Examples of image objects for this image would include the entire scene (with textual descriptor Life on the Farm), the horse region, the cow region, the chicken region, the sky region(s), the grass region(s), and the animal regions (the union of the horse, cow, and chicken regions). Now, each image object in an image database contains a set of unique and characterizing features F"+ f1 , 2 , fk ,. These features can be global (e.g., the perimeters, areas, or principal axes directions of the corresponding regions) or local (e.g., the points of high curvature around the region boundaries). We believe that the nature as well as the spatial relationships of these various features can be used to characterize the corresponding image objects [6, 17, 20]. In 2D space, many of these features can be represented as sets of points. These points can be tagged with labels to capture any necessary semantics. Each of the individual points representing some feature of an image object we call a feature point. The entire image object is represented by a set of labeled feature points + p1 , 2 , pk ,. For example, a corner point of an image region has a precise location and can be labeled with the descriptor corner point, some numerical information concerning the nature of the corner in question, as well as the region’s identifier. A color histogram of an image region can be represented by a point placed at the center-of-mass of the given region and labeled with the descriptor color histogram, the histogram itself, as well as the region’s identifier. We note that the various spatial relationships among these points are the important aspect of our work, which causes it to be different from other approaches [21]. Effective semantic representation and retrieval requires labeling the feature points of each database image object. The introduction of such feature points and associated labels effectively converts an image object into an equivalent symbolic representation, called its point feature map. We have devised an indexing mechanism to retrieve all those images from a given image database which contain image objects whose point feature map is similar to the point feature map of a particular query image object [17]. Our approach has the following properties: it is rotation, translation, and scale invariant; it is efficient for large image collections; it is an incremental approach, allowing us to match image objects at various levels of detail, from coarse to fine (non-matching objects are eliminated as soon as possible, during the coarser matching phases); and, the retrieved images are rank-ordered on the basis of their similarity to the query image object, for subsequent browsing.
326
Y. TAO AND W. I. GROSKY
2.1. Point Feature Map Representation An important criterion for shape indexing schemes is that the shape representation should be translation, scale, and rotation independent [8]. In addition, shape representation should possess good discriminating capabilities and must be robust. Color histogram indexing has been proven to be very useful for content-based retrieval in image databases, and is widely recognized as an image retrieval method with sufficient distinguishing capabilities [1, 2, 7, 22]. Given a discrete color space such as RGB or HSV, a color histogram is obtained by discretizing the image colors and counting the number of times each particular color occurs in the image. Our approach to indexing the spatial arrangements of features of an image object is also histogram-based. The methodology of the feature point histogram representation for image objects is quite simple. Within a given image, we first identify particular image objects to be indexed. For each image object, we construct a corresponding point feature map. In this paper, we assume that each feature is represented by a single feature point and that each point feature map consists of feature points having the same label descriptor, such as corner point. We then construct a Delaunay triangulation [23] of these feature points. Finally, the feature point histogram is obtained by discretizing the angles produced by this triangulation and counting the number of times each discrete angle occurs in the image object of interest, given the selection criteria of which angles will contribute to the final feature point histogram. For example, the feature point histogram can be built by counting the two largest angles, the two smallest angles, or all three angles of each individual Delaunay triangle. An O(max(N, Cbins)) algorithm is necessary to compute the feature point histogram corresponding to the Delaunay triangulation of a set of N points. Our idea of using a feature point histogram to represent an image object originates from the fact that if two image objects are similar, then both of them should have the same set of feature points. Thus, each pair of corresponding Delaunay triangles in the two corresponding Delaunay triangulations must be similar to each other, independent of the image object’s position, scale, and rotation. In this study, corner points, which are generally high-curvature points located along the crossings of an image object’s edges or boundaries, will serve as the feature points for our various experiments. We have previously argued for representing an image by the collection of its corner points in Ahmad and Grosky [17], which proposed an interesting technique for indexing such collections provided that the image object had been normalized. In our present approach, which is histogram-based, the image object does not have to be normalized. Our technique also supports an incremental approach to matching, from coarse to fine, by varying the bin sizes. We note that the local movement of feature points and even the presence of outliers affect the Delaunay triangulation only locally. As a result, depending on the bin size, the computed feature point histogram is not appreciably changed. For color histograms [22], the histogram of a sub-image of a given image object is a sub-histogram of the histogram of the original image object. This is not technically the case with our histograms, but we can argue that good sub-image matches can usually be obtained by means of this property as if it were true. In fact, histogram-based representation is lossy, and not unique. Thus, like color histogram indexing, different image objects may have the same feature point histogram representation. As a histogram can be easily
327
IMAGE INDEXING AND RETRIEVAL
represented as a multidimensional point, standard nearest-neighbor approaches to indexing can also be used. Our various experiments [24, 25] have concluded that the standard N-dimensional L 2 metric, also known as the Euclidean distance, performs very well as the similarity measure function in combination with the feature point histogram computed by counting the two largest angles of each individual Delaunay triangle. More precisely, for Q the query image object, D a database image object, qi the number of angles in the ith bin of the query image object histogram, di the number of angles in the ith bin of the database object histogram, and n the total number of histogram bins, we have that
d( Q , D)"
n
(qi!di )2. i"1
The distance between the query and database image objects decreases in value as the image objects become increasingly similar and increases in value as the image objects become increasingly different. Through many experiments, we have found evidence that an image object representation using a feature point histogram provides an effective cue for image object discrimination. Theoretically, from the definition of a Delaunay triangulation, it is easily shown that the angles of the Delaunay triangles of a set of feature points remain the same under uniform translations, scalings, and rotations of the point set. An example is shown in Figure 1. Figure 1(b) shows the resulting Delaunay triangulation for the set of 31 feature (corner) points shown in Figure 1(a). Figure 1(c) shows the Delaunay triangulation of a transformation (translation, rotation, and scaled-up) of this set of points, while Figure 1(d) shows the Delaunay triangulation of another transformation (translation, rotation, and scaled-down) of this set of points. Finally, Figure 1(e) shows the resulting feature point histogram by counting the two largest angles of each individual Delaunay triangle with bin size of 103.
2.2. Spatial Constraint Filter As we will see in Section 4, many image queries consist of finding database images containing multiple image objects in various spatial relationships to each other. This entails first finding database images containing the particular image objects and then checking whether these image objects are in the requisite spatial relationships. The degree of conformance is generally used to rank-order the query results. There are many existing techniques for these types of queries. Those techniques utilizing spatial representations by 2D-strings and its variants suffer from exponential time complexity in terms of the number of concerned image objects, however. Methodologies utilizing a spatial orientation graph produce a list of n(n!1)/2 edges for n image objects, since each pair of image objects is associated with an edge. Since all the slope values of the edges are used to compute the resulting spatial similarity, these algorithms have a quadratic time complexity. We have implemented an efficient spatial filter that can drop those database images containing the same image objects as the query image but not satisfying the required spatial constraints. It is based on the fact that if the spatial relationships of the query and
328
Y. TAO AND W. I. GROSKY
Figure 1. (a) A set of 31 points. (b) Delaunay triangulation. (c) and (d) Other Delaunay triangulations. (e) 2-largest-angle histogram with bin size of 103
database images are similar, their resulting Delaunay triangles should have a pair-to-pair match. To achieve this, an edge list resulting from the Delaunay triangulation of the set of centroids representing the image objects of interest is kept. Only those edges appearing in the resulting Delaunay triangulation are included in the edge list, as opposed to the exhaustive enumeration of each pair of edges between image objects, as in Gudivada and Raghavan [16]. This results in a linear time complexity and to a more compact spatial representation without losing too much precision. The proof of
IMAGE INDEXING AND RETRIEVAL
329
Delaunay’s theorems and properties which lies behind our approach is beyond the scope of this paper, but can be found in Rourke [23]. There are O(N log N ) algorithms [26, 27] among the various implementations for constructing the Delaunay triangulation of a set of N points. An informal description of the incremental algorithm adopted by us is as follows: with a set of at least 3 points, we make sure that there always exists a Delaunay triangulation at the end of each insertion of a new point. Once a new point is inserted, if the point is inside any of the already existing Delaunay triangles, then this point is linked with the three nodes of the corresponding Delaunay triangle; otherwise, this point is outside of the convex hull of the current Delaunay triangulation, and the point is then linked with all the visible points on the convex hull. Next, swap tests are performed on these newly created triangles. Simply speaking, a swap test of a triangle consists of checking whether there is an extra point inside the circumcircle of this triangle. In a swap test, two triangles sharing a common edge (one diagonal) are examined, and with one triangle having a circumcircle, the point (not on common edge) of the other triangle is examined to see whether it is inside this circumcircle. If it is, the shared common edge of the two triangles is replaced by the other diagonal, and then swap tests have to be made recursively on those updated triangles. If the corresponding query and database image objects are inserted in the same order during the execution of the above algorithm, our experiments show that, instead of computing the slope values of edges as in Gudivada and Raghavan [16], the spatial filter performs well by checking only whether each pair of the corresponding edges in their ordered lists have an edge-to-edge match of nodes (image objects).
2.3. Object Semantics The process of grouping low-level image features into meaningful image objects and then automatically attaching semantic descriptions to image objects of interest is still an unsolved problem in image understanding. In order to answer a query such as finding all images in which an airplane which resembles the one displayed in a query image appears, we provide a mechanism in our web-based OBIR indexing interface [24] to let the user associate image objects with high-level semantic information when images are inserted into the database. The task of extracting and labeling feature points of interest should be performed consistently for both database and query images. The mechanism provided in the OBIR system is semiautomatic. The user is first instructed to specify the URL of the image to be indexed. Then, the system displays the original image, and a new image obtained after the original one is manipulated by a series of image processing procedures. Next, in the window shown in Figure 2(a), the user is instructed to draw a polygon around each individual image object to be stored in the database (we assume that each image object consists of a single region). This is done by using mouse clicks to place polygon vertices around the given object. Then, the user clicks on the button called Done, to pop up a new window [see Figure 2(b) ], in which the selected image object can be associated with some real-world object by choosing from a controlled vocabulary list. Finally, upon clicking the button called Index, all the feature points of the selected image objects will be transformed into index entries in the image index database [24], in combination with their corresponding semantic information and image URL.
330
Y. TAO AND W. I. GROSKY
Figure 2. (a) Outline left image object. (b) Attach high-level semantics
Each of the indexed image objects—typically, a semcon [19]—contains semantic, feature, and spatial information. In order to support various image queries (discussed in Section 4) via semcons, we propose the use of a so-called hierarchical icon graph (HIG) for organizing the is-a (specialization/generalization) relationships between different semcons. At present, the HIG is made up of three levels. At the root level, there is only one generic semcon that can be considered as an abstract class in our object-oriented approach. At the middle level, different semcons represent the corresponding object classes in the real world, and no two semcons have the same attached semantics. At the
IMAGE INDEXING AND RETRIEVAL
331
Figure 3. An example hierarchical icon graph
leaf level, each semcon has only one parent semcon, and all of those semcons which possess one common parent semcon have the same semantic annotation, but they are different in shape, and possibly possess different point feature maps if placed into the query specification space (discussed in Section 4). This general schema can be efficiently adapted to specific domains to achieve high performance. Figure 3 is an example of our test database with six types of image objects shown at the middle level and 10 different leaf shapes shown at the leaf level. As shown, the user can create different queries by simply adding semcons from the icon space to the query specification space.
3. Object-based Image Queries Four types of queries for locating images from image databases have been identified [28], based on the included image objects and their spatial relationships: (1) Retrieval by the existence of image objects ignoring the spatial relationships among them.
332
Y. TAO AND W. I. GROSKY
(2) Retrieval by pairwise spatial relationships between image objects. (3) Retrieval by a subimage. (4) Retrieval by the entire image. The OBIR system supports all four types of queries. It is based on our image model and utilizes our approaches for image object discrimination using the feature point histogram, the spatial constraint filter discussed in Section 2.2, and the semantic object representation using a particular hierarchical icon graph. An image is actually indexed by its constituent objects (semcons), the corresponding global feature point histogram, and the unique image URL, in which each image object (semcon) consists of a set of feature points characterizing its shape, the corresponding feature point histogram, its centroid, and its associated semantics (object class with respect to a real-world object). In the OBIR query interface, we distinguish four types of image query. The query definitions and similarity measure functions are described below.
3.1. Query-by-objects-appearing A common query for content-based image retrieval asks to display database images that contain image objects which match or are closely similar to the given image objects in a particular query image. This type of retrieval is called a query-by-objects-appearing, and is independent of the translation, rotation, and scale of each of the constituent image objects. If both the query and the database image objects have semantic annotations, we note that the OBIR system will only compare the query image object to those image objects in the database having the same attached semantics. Assume that the user presents a query image Q"+O1 , O2 , 2 , OM ,, which consists of M image objects (semcons). To answer a query-by-objects-appearing with cutoff n, the OBIR system first determines, for each individual query image object, the ordered set of the top-n matching database images, denoted by Ansn ( Q j ), for each query image object Oj , for 14j4M. We note that the set of top-n matches for each of these M component queries may be quite different, i.e. in general, Ansn ( Q i) OAnsn ( Q j ) for 14iOj4M, whether considered as ordered sets or unordered sets. We then compute Ansn ( Q 1 )5Ansn ( Q 2 )525Ansn ( QM ), where the Ansn ( Q j ) are considered as unordered sets. We note that this intersection may contain k database images, for 04k4n. Let us denote this set by +A1 , A2 , 2 , Ak ,. An ordered set of these k images is then presented to the user as the answer to the user’s query. This ordering can be done in many ways. One approach which we have used is to define the position of Ai in the final result by the value of a 1 * d( Q 1 , Ai )#a 2 * d( Q 2 , Ai )# 2#aM * d ( Q M , Ai ), where, for 14j4M, d ( Q j , Ai ) is the similarity between the feature point histogram of query image object Oj and the feature point histogram of the feature points of the image object corresponding to Oj which is found in image Ai . Also, for 14j4M, each aj is between 0 and 1, inclusive, and, M
aj"1 j"1
IMAGE INDEXING AND RETRIEVAL
333
A typical value for aj which we have used is 1/M. Another approach would be to define the position of Ai by the value of a 1 * pos ( Q 1 , Ai )#a 2 * pos ( Q 2 , Ai )#2# a M * pos ( Q M , Ai ), where pos ( Q j , Ai ) is the position of Ai in the ordered set Ansn ( Q j ).
3.2. Query-by-spatial-objects-appearing Besides doing content-based image retrieval on the identities of the objects appearing in the image, we may also be concerned with the spatial relationships among these objects. We call this type of retrieval a query-by-spatial-objects-appearing. For this type of query, we assume that the query image contains at least three objects. If not, we query by objects appearing, as above. Assume that the user presents a query image Q"+O1 , O2 , 2 , OM ,, which consists of M objects (semcons). To compute the results of a query-by-spatial-objects-appearing with cutoff n, the OBIR system first executes a query-by-objects-appearing with cutoff n to obtain the answer set of k ordered database images. Then, a spatial constraint filter, as discussed in Section 2.2, is performed on this set to drop those candidate images that do not satisfy the spatial constraints among the query objects. Each image object is replaced by its centroid for the filtering step.
3.3. Query-by-subimage In this type of query, the query image can contain several image objects, but the union of these image objects is considered as a single, more complex, image object. We are looking for database images that contain a possibly translated, rotated, and scaled version of this more complex image project. Assume that the user presents a query image Q"+O1 , O2 , 2 , OM ,, which consists of M objects (semcons). To compute the query-by-subimage with cutoff n, the OBIR system first executes the query-by-objects-appearing with cutoff n, but in its last step, images in +A1 , A2 , 2 , Ak , are sorted using a different distance measure. We define the position of Ai by the value of b * d ( Q, Ai )#c1 * d ( Q1 , Ai )#c2 * d ( Q 2 , Ai )#2#cM * d ( Q M , Ai ), where d ( Q, Ai ) is the similarity between the feature point histogram of the union of query image objects O1 , 2 , OM and the feature point histogram of the feature points of the image object corresponding to this union which is found in image Ai . Also, 04b41 and, for 14j4M, cj"aj (1!b).
3.4. Query-by-entire-image Query-by-entire-image can be regarded as a special case of query-by-subimage. In the OBIR system, this type of query does not consider any semantic information to produce its results. This makes query-by-entire-image different from the other three types of image queries. Assume that the user presents a query image Q"+O1 , O2 , 2 , OM ,, which consists of M objects (semcons). To compute the results of the query-by-entire-image with cutoff n, the OBIR system uses the distance measure d( Q, D ), for Q the query image and D a database image, to rank the top-n query results. The entire query and database images are used to construct the appropriate feature point histograms.
334
Y. TAO AND W. I. GROSKY
4. Object-based Image Retrieval With the recent development of the client/server architecture using Internet information services, it has become increasingly more desirable to use computers to connect to the Internet in order to access information via web browsers. The OBIR system [24] allows users to index, query, and browse images on the WWW in an effective and efficient way. It focuses specifically on exploiting object-based image retrieval using various types of feature points, as well as spatial and semantic information.
4.1. The Architecture of Image Indexing Interface The OBIR image indexing interface is a three-tier client/server application. The threetier model is employed because the middle tiers offers control over access authentication and data updates. With an application server as the middle tier, the network is not involved as long as the user operates on the graphical interface within Java applet. Only when the user requires some computational response from the server after labeling all the image objects, requests are sent to the application server located at the web server side. The communication between the client ( Java applet) and the application server is implemented through plain Java sockets provided by the java.net package. The application server robustly manages the various connections. Through JDBC API ( JDBC driver for Microsoft SQL Server 6.0), the computed results are stored into the database by SQL statements. The client will receive a reply to his request from the server. A scenario of image indexing procedures is given in Section 2.3. The detailed architecture for image indexing is shown in Figure 4(a).
4.2. The Architecture of Image Query Interface The OBIR image query interface is divided into two parts: the client and the server. The network communication between the client (web browser) and the server (web server) is done via Java Database Connectivity API ( JDBC driver for Microsoft SQL Server 6.0) for the data stored in the relational database of MS SQL Server, and via HTTP for any non-relational data, such as database images and the icons which represent semcons. The client part is responsible for creating image queries, computing the results based on the query information, and displaying query results. The server is responsible for sending some necessary indexing information to the client. Additionally, the server maintains the two data repositories of image storage and a relational database. The detailed architecture for image query is shown in Figure 4(b).
4.3. Database Population In order to demonstrate the efficacy of our object-based image model and the image query interface of the OBIR system, we have constructed a test database. We started with 24 single object images downloaded from the web, which comprise 10 fish, 10 leaves, one airplane, one key, one sportsman, and one umbrella. Using Photoshop 4.0, we then created 24 semcons from this set of images. Then, an additional 192 single object images were created by randomly rotating, scaling, and translating the 24 original
IMAGE INDEXING AND RETRIEVAL
335
Figure 4. (a) The system architecture for image indexing. (b) The system architecture for image query
image objects. Furthermore, 26 images containing two objects, 20 images containing three objects, 20 images containing four objects, and 20 images containing five objects were created by selecting from this set of 216 single object images. For ease of manipulation, we note that each of the multiple object images contains only distinct objects. The test database thus contains a total of 302 images. Through the image indexing interface of the OBIR system, all images are processed and their index entries are stored in the relational database.
4.4. Sample Queries In order to support effective image retrieval, a query interface should allow users to describe image semantics and visualize queries in a natural way. Through our image query interface, users can flexibly query the image database by simply clicking and moving the mouse in the query specification window in order to select or delete indexed
336
Y. TAO AND W. I. GROSKY
image objects (semcons). For a user-created image query, the selected image objects are visually similar to, and satisfy the spatial constraints of, those real-world objects envisioned in the user’s mind. This permits a user to query the system based on the visual contents of an image without forcing them to know the exact values of the image features. This semcon-based query image specification approach is well supported by our approach to image indexing. In the following, we give some sample queries, and compare their results. Query-by-objects-appearing versus query-by-spatial-objects-appearing. Figure 5(a) illustrates the first image query, which includes semcons of one leaf, one fish, one sportsman, and one umbrella. The result of the query-by-objects-appearing is shown in Figure 5(b) and 5(c). For this type of query, it does not matter where a particular semcon is located in the query specification space. Figure 5(d) illustrates the same set of semcons with the spatial relationships shown by the corresponding Delaunay triangulation. The result of the query-by-spatial-objects-appearing does not contain Figure 5(b), which is dropped during the spatial filtering step. Therefore, the result of the second query consists only of the image shown in Figure 5(c). Query-by-objects-appearing versus query-by-subimage. Figure 6(a) illustrates the third image query which includes the same set of semcons as in the first and second queries. Compared with the results of the first query, the query-by-subimage returns the same set of two database images as in the previous results, but in a different rank order. For the first query, the result shown in Figure 6(b) has a lower similarity measure than that shown in Figure 6(c). For the third query, both the local (each semcon individually) and the global spatial information (four semcons taken as a complete unit) are taken into account for re-ranking the order of the results. Therefore the third query image is more likely to be a subimage of Figure 6(b) than a subimage of Figure 6(c). Query-by-spatial-objects-appearing versus query-by-subimage. According to the results of the second and third queries, we can conclude that query-by-spatial-objects-appearing validates the presence of the appropriate image objects as well as the spatial relationships between each pair of image objects. These spatial relationships are validated at the filtering step, when each image object can be represented as a labeled feature point located at the object’s centroid. The spatial filter is aware of this information, and all images not satisfying the appropriate spatial constraints will be dropped. On the other hand, query-by-subimage treats the union of the query image objects as a complete unit in the ranking of database images produced by a query-by-objects-appearing. Query-by-whole-image versus the others. Query-by-whole-image is the simplest query. It ranks the order of the query results based on global feature point histograms only, and also ensures that each image in the query results contains the same number of image objects as the query image.
5. An Application of our Approach to Neurological Surgery Training Even now, the most common form of surgical training follows the traditional learning-bydoing rule. In general, inexperienced surgeons learn their skills from their tutors while operating on patients. It is also known that surgeons can improve their skills by reading diagnostic records of former patients. For example, an inexperienced neurosurgeon may want to browse through prior surgical planning data of those patients who have
IMAGE INDEXING AND RETRIEVAL
Figure 5. (a) First query. (b) Result image 1 of 2.
337
338
Y. TAO AND W. I. GROSKY
Figure 5. (c) Result image 2 of 2. (d) Second query
IMAGE INDEXING AND RETRIEVAL
Figure 6. (a) Third query. (b) Result image of 1 of 2.
339
340
Y. TAO AND W. I. GROSKY
Figure 6. (c) Result image 2 of 2
pathologies resembling the patient under treatment. This entails a neurological surgery training system which allows neurosurgeons to index, query, and browse various medical records in an effective and efficient way. In 2D space, we have proposed an MRI image model to organize the patients’ medical records. As shown in Figure 7, it includes not only the text annotations about the MRI slices, but also the visual representation of pathologies such as lesion region and spatial locations and spatial correlation of several lesion regions [29]. The methodology of lesion representation using feature point histograms in 2D space can be naturally extended to 3D space. To be specific, a lesion volume can be approximated as a 3D polygon, a so-called polyhedron. Delaunay triangulation of a polyhedron results in a Delaunay tetrahedrization. As three points are necessary to describe a plane in space, a volume element in space is described at least by four points. Therefore, the most elementary volume element is a tetrahedron, which is a 3D triangle specified by four points. An example is shown in Figure 8, which is a 3D Delaunay triangulation of a 7-point polyhedron. At a finer level, we can count the triangles produced by this triangulation, with each triangle distinguished by its two largest angles to compute the resulting feature point histogram. By utilizing the efficacy of this work, we are initiating a project for the design and implementation a web-based system for neurological surgery training in conjunction with Neurological Surgery Department of Wayne State University. The finally
341
IMAGE INDEXING AND RETRIEVAL
Figure 7. An MRI brain image model
Figure 8. Delaunay triangulation of polyhedron
implemented system will support not only traditional text-based queries, but also will allow neurosurgeons to query medical images by visual contents without forcing them to know the exact values of the image features. For example, a more involved query is to find prior patients’ records in which the segmented lesions in the same location of the MRI brain images have a similar shape to those under treatment.
6. Conclusion In this paper, we have described the image indexing and query interfaces of our web-based OBIR system. We have developed an image model that represents a semcon via a point feature map, which models the spatial relationships of various point-based features of an image object. We have incorporated a hierarchical icon graph to naturally represent the relationships among various semcons, and have let users flexibly create
342
Y. TAO AND W. I. GROSKY
image queries with semcons. Through a relatively small test database, we have illustrated four types of image queries, which are well supported by the front end of our image query interface and the backend of our image index storage. We are now instantiating our OBIR system to an application for neurological surgery training. Also, we are refining the object-based image model so as to integrate more features such as color and texture into a semcon’s representation.
References 1. W. Niblack, R. Barder, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, P. Yanker, C. Faloutsos & G. Yaubin (1993) The QBIC Project: querying images by content using color, texture, and shape. In: Proceedings of SPIE Storage and Retrieval for Image and Video Databases, Vol. 1908, January, pp. 173}181. 2. J. K. Wu, A. D. Narasimhalu, B. M. Mehtre, C. P. Lam & Y. J. Gao (1995) CORE: a content-based retrieval engine for multimedia information systems: Multimedia Systems 3, 25}41. 3. W. I. Grosky (1997) Managing multimedia information in database systems. Communications of the ACM 40, 73}80. 4. V. E. Ogle & M. Stonebraker (1995) Chabot: retrieval from a relational database of images. IEEE Computer 29, 40}48. 5. J. Huang, S. R. Kumar, M. Mitra, W.-J. Zhu & R. Zabih (1997) Image indexing using color correlograms. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, June, pp. 762}768. 6. J. R. Smith & S.-F. Chang, Integrated spatial and point feature map query. ACM Multimedia Systems Journal (to appear). 7. M. J. Swain & D. H. Ballard (1991) Color Indexing. International Journal of Computer Vision 7, 11}32. 8. B. M. Mehtre, M. S. Kankanhalli & W.-F. Lee (1997) Shape measures for content based image retrieval: a comparison. Information Processing & Management 33, 319}337. 9. G. Lu (1997) An approach to image retrieval based on shape. Journal of Information Science 23, 119}127. 10. F. Mokhtarian, S. Abbasi & J. Kitter (1996) Efficient and robust retrieval by shape content through curvature scale space. In: Proceedings of International Workshop on Image Database and Multimedia Search, Amsterdam, The Netherlands, August, pp. 35}42. 11. S.-K. Chang, Q.-Y. Shi & C.-W. Yan (1987) Iconic indexing by 2D strings. IEEE Transactions on Pattern Analysis and Machine Intelligence 9, 413}428. 12. E. Jungert & S. K. Chang (1989) An algebra for symbolic image manipulation and transformation. In: Proceedings of the IFIP TC 2/WG 2.6 Working Conference on Visual Database Systems. Elsevier Science Publishing Company, Amsterdam, The Netherlands, pp. 301}317. 13. S.-Y. Lee & F.-J. Hsu (1990) 2D C-String: a new spatial knowledge representation for image database system. Pattern Recognition 23, 1077}1087. 14. P. W. Huang & Y. R. Jean (1994) Using 2D C>-strings as spatial knowledge representation for image database systems. Pattern Recognition 27, 1249}1257. 15. V. N. Gudivada (1998) hR-String: a geometry-based representation for efficient and effective retrieval of images by spatial similarity. IEEE Transactions on Knowledge and Data Engineering 10, 504}512. 16. V. N. Gudivada & V. V. Raghavan (1995) Design and evaluation of algorithms for image retrievals by spatial similarity. ACM Transactions on Information Systems 13, 115}144. 17. I. Ahmad & W. I. Grosky (1997) Spatial similarity-based retrievals and image indexing by hierarchical decomposition. In: Proceedings of the International Database Engineering and Application Symposium (IDEAS ’97 ), Montreal, Canada, August, pp. 269}278. 18. Z. Jiang, W. I. Grosky & L. Zamorano (1996) Immersive database—concepts and preliminary study. The Journal of Medicine and Virtual Reality 1, 20}26.
IMAGE INDEXING AND RETRIEVAL
343
19. W. I. Grosky, F. Fotouhi & Z. Jiang (1997) Using metadata for the intelligent browsing of structured media objects. In: Managing Multimedia Data: Using Metadata to Integrate and Apply Digital Data (A. Sheth & W. Klas, eds). McGraw-Hill Publishing Company, New York, pp. 67}92. 20. W. Hsu, T. S. Chua & H. K. Pung (1995) An integrated color-spatial approach to content-based image retrieval. In: Proceedings of ACM Multimedia, San Francisco, CA, November, pp. 305}313. 21. H. V. Jagadish (1991) A retrieval technique for similar shapes. In: Proceedings of the ACM SIGMOD Conference, Denver, CO, June, pp. 208}217. 22. J. Hafner, H. S. Sawhney, W. Equitz, M. Flickner & W. Niblack (1995) Efficient color histogram indexing for quadratic form distance functions. IEEE Transactions on pattern Analysis and Machine Intelligence 17, 729}736. 23. J. O’Rourke (1994) Computational Geometry in C. Cambridge University Press, Cambridge, England. 24. Y. Tao & W. I. Grosky (1998) Image matching using the OBIR system with feature point histograms. In: Proceedings of 4th IFIP 2.6 Working Conference on Visual Database Systems (VDB4 ), L’Aquila, Italy, May, pp. 192}197. 25. Y. Tao & W. I. Grosky (1999) Delaunay triangulation for image object indexing: a novel method for shape representation. Proceedings of IS&T/SPIE’s Symposium on Storage and Retrieval for Image and Video Databases VII, San Jose, CA, 23}29 January, pp. 631}642. 26. R. A. Dwyer (1987) A faster divide-and-conquer algorithm for constructing Delaunay triangulations. Algorithmic 2, 127}151. 27. S. Fortune (1987) A sweepline algorithm for voronoi diagrams. Algorithmic 2, 153}174. 28. S.-Y. Lee & M.-K. Shan (1990) Access methods of image database. International Journal of Pattern Recognition and Artificial Intelligence 4, 27}44. 29. Y. Tao, W. I. Grosky, L. Zamorano, Z. Jiang & J. Gong (1999) Segmentation and representation of lesions in the MRI brain images. In: Proceedings of SPIE’s Symposium on Medical Imaging (MI ’99 ), San Diego, CA, 21}26 February, pp. 930}939.