Vistas in Astronomy,
Vol. 39, pp. 3745, 1995 Copyright @ 1995 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0083-6656/95 $29.00
0083-6656(95)00042-9
Efficient Information
Access in an On-Line Image Archive A. Csillaghy
Institute of Astronomy ETH-Zentrum,
CH-8092 Zurich, Switzerland *
Abstract: The rapid development of world-wide networking during the last two years, due partly to the simple concept of addressing with uniform resource locators (URLs), has drastically simplified information exchange. This expanding communication media has created new possibilities in the access of large databases, and has therefore found a particular interest in astronomy. The most used means of information transfer is text. If large sets of multi-dimensional data, commonly images, have to be accessed, longer transfer times compared with text urge the data manager to find more efficient solutions. Modern tools are not as efficient at browsing among images through a network compared with browsing text. Either the images being browsed are too large to be transferred quickly, or the image size is reduced so much that the interesting information is lost during the data reduction. In this article, an alternative to the transfer of plain images is investigated, based on the fact that only a part of an image’s information content is needed for browsing. Images may therefore be reduced to icons which only symbolically represent the information content. Such image icons are used to browse solar radio spectrograms in the Zurich Radio Astronomy Group archive.
1. LOOKING
FOR A NEEDLE IN A HAYSTACK
Developments in high-speed computer networks have increased the possibilities of worldwide information exchange of all kinds. Although text is the most common information transmitter, many research domains, such as astronomy, can increase research ‘productivity’ by making many multi-dimensional data sets accessible on-line. In this way, the possibility of astrophysicists to (expeditively) access observation material is increased. Since browsing large multi-dimensional data is unrealistic, and often unnecessary, the data sets have to be simplified to the essential. Often, they are reduced to a textual description contained in a relatively small file header (as in the FITS [15] Format), so that information browsing can be done through a relational, textual database [7]. In other cases, associated text can be used to search for information about an image [l 11. However, when such methods fail, i.e. when the users have no * email:
[email protected] 37
38
A. Csillaghy
a priori information about the data, browsing among the contents of the database becomes necessary. Images are mostly used for analysis of multi-dimensional data sets because it is a simple and comprehensive way to represent data, and can be displayed on (nearly) every computer. In the following discussion, we will therefore concentrate on them. As modern astronomical instruments are getting more sensitive (mainly with the development of CCD cameras), tremendous amounts of data can be recorded. Recording rates will soon be the order of GBytes per day. Such praiseworthy observational improvements open new concerns in making data accessible in an advantageous time. The user should save time and get more information than he could with older methods. Furthermore, a modern data archive must be accessible through the Internet in order to gain consideration in the astronomical community. Therefore, astronomical information retrieval cannot be considered anymore without networking issues. In this paper, we address the question of how to browse efficiently among a large quantity of images, especially through the Internet. We focus on direct image browsing, i.e. without considering methods related to associated text. As mentioned before, this method is intended for a case where no a priori information about the observations is known. The problem can be compared with looking for a needle in a haystack, without knowing what the needle looks like! Browsing efficiently, during the search (and not the analysis) for information, can be realized by reducing the original data to its essential form. Interesting image parts (e.g. a globular cluster in a galaxy) must be enhanced and the rest (e.g. the background radiation) neglected. To achieve this goal, Hinterberger [9] proposed to get away from the bitmap and inspect the pixel’s distribution in a higher dimensional space. By partitioning the latter into regions with constant point numbers, the original data set can be well compressed, and specific regions may be enhanced. Such a method is applied to browsing among solar radio spectrograms recorded by the spectrometers of the Radio Astronomy Group (RAG) at ETH.
2. BROWSING SYMBOLIC IMAGES INSTEAD OF BITMAPS 2.1. Hypertext and Hyperimage Hypertext browsers for (HTML) documents, such as NCSA Mosaic, have been developed for browsing text. The purpose of images in HTML documents is above all illustration, either as buttons, anchors or decoration, and therefore they do not contribute to the main purpose of fast information browsing. In the case where the main source of information is contained in images and cannot be reduced to text, an efficiency problem occurs since hypertext cannot be expanded in a straightforward manner to a hyperimage. Naturally, because of the order(s) of magnitude difference in storage size, image transfer is slower than text transfer, even using compressed formats (for example, GIF files). In addition to the natural, large size of multidimensional data sets, the fact that there is one hypertext transfer protocol (http) request per image, and the limited capacities of the server and network, contribute to slowing down the image transfer process significantly. Since the number of Web users is increasing rapidly [lo], it is likely that this situation will get worse. These arguments lead to the exploration of transformation methods which could describe the image’s information content in a way other than with a bitmap.
Information
39
Access in an On-Line Image Archive
250 N
300
=
350
” 3 5
425
s 2
613 850 900 950
I 1
:23:50
I
I
I
I
I
11:24:10 time
I
I
I
J
20
3 Duration
in UT
[s]
Fig. I Comparison between an original data set (left) and the icon produced automatically (right). The data represent a solar radio observation (called spectrogram, see below) where enhanced emission is shown white. Note that the frequencies are not equidistant in the original data.
2.2. Symbolic
Image Representation
For browsing, the full information content of an image is not necessary. A rough image description may be enough to choose or neglect data for further analysis. Following this hypothesis, we investigate an image transformation which graphically describes the main features of the original image with less storage. In this approach, the bitmap is replaced by a symbolic representation, called the image icon, using a stack of simple geometric forms, e.g. rectangles (strictly speaking, hyperboxes for multi-dimensional data sets). The image is simplified by considering similar pixels, i.e. pixels with nearby coordinates and color values, as belonging to the same entities. As a first example, an image icon is compared with its original counterpart in Fig. 1. Horizontal lines are interferences and have been automatically identified and suppressed. The original image is represented with 400 rectangles and the compression factor is 18. Transformations of bitmaps in image icons have the following advantages over conventional data reduction methods, such as integration, nearest neighbor or LZW-compression: (1) Image parts important to the application can be artificially enhanced; _ (2) Icons must not be decompressed before being displayed; (3) They (lossy) compress large images efficiently. For a 1 MByte image, the factor is about 115 [5]; (4) An unprotected overview of the image content can be done while keeping the original data protected.
3. THE CONSTIWCTION
OF IMAGE ICONS
There are a multitude of ways to abstract, or extract, information from a data set; for example, contouring, clustering, smoothing, etc. [l]. Other techniques involve the analysis of the data in Fourier or Wavelet space [4]. We have chosen an approach based on a partition
40
A. Csillaghy
of the parameter space (i.e. for an image, the space spawned by the coordinates and color value) into regions containing a fixed number of points. The overall scheme of the process is described in Fig. 2 and is discussed below. Original image $
Parametrization
Representation in parameter spat Introduction into a gridfile
$ Grid directory ti I
Region reduction
Boxes
I Boxes selection
t
I
Image Icon
I
Fig. 2. The process used to produce icons. After representing the image in its parameter space, the parametrized pixels, or points, are inserted in a grid file. The grid directory, a virtual representation of the parameter space, is then analyzed and boxes, characterizing the pixel distribution, finally undergo a selection which yields the final icon.
3.1. Representation
of an Image in its Parameter
Space
We discuss three-dimensional data sets, which are usually represented as images, although the generalization to n dimensions is straightforward. Let us define a finite set S of threedimensional elements: S=
{(xj,xf,xf)
:i=O ,..., n-
l},
(1)
where n is the dimensionality of the set. A pixmap can be represented in this manner when pixels are parametrized, i.e. coordinates x,! and x’ are explicitly associated to x:. In the parameter space, data points may form clusters, as shown in Fig. 3a for a simpler twodimensional case. The parameter space is partitioned into regions with a fixed number of points (Fig. 3b). Therefore, regions including similar points will be smaller than regions with different points (Fig. 3~). 3.2. Practical Implementation
The principle of parameter space partitioning is used in many spatial data structures [14]. Originally used for saving storage, or fast access of data, the visualization possibilities of spatial data structures come from the fact that similar data points are aggregated. We realized the space partitioning by inserting the data points into a grid file [8][12]. In the latter, the data structure called the grid directory renders the partition. To correctly visualize the grid directory, regions are individually analyzed to localize the points (and therefore eliminate regions with
41
Information Access in an On-Line Image Archive
no points at all). Boxes are fitted to the data by computing the data’s mean and standard deviation for each attribute i and region:
xi = _1
i=1,2,3
4
N
,
(2)
where N is the number of points in a region. The six-tuple < Zt, X2,X3,crl, u2, a3 > is named a 3-box. It is displayed on the screen, as shown in Fig. 4, with a rectangle centred in < Z-‘,Z2 >, with an extent given by a’ and 02. The color of the box is inferred from the value given by Z3. 3.3. Selection of the Right Set to Represent the Information The set of boxes derived from the grid regions possess characteristic shapes. It gives hints about where information is in the image, and simplifies the information extraction. Many boxes do not contribute to the visualization of information and can be eliminated. For example, a box filling a large volume is unlikely to contribute to information extraction. On the contrary, a box with a small volume implies similar values in a region. An exhaustive analysis of selection criteria will be the topic of a future work. We notice that the selection criteria for boxes can be
.
f
l
. .
. l
l
Attribute 1
. :” I -II I 1
‘_‘__r
I
l
l
. l
.
(b)
(a)
*
I
Attribute 2
_____
l
-a
. .
_____
_
I
I
;
J
w
____ ;.**_______ :e . . I
I
l
1
I
_____________
L I
El.
(cl
(4 ;
Fig. 3. A two-dimensional data set is represented in its attribute space. (a) Similar data points, or records, form clusters. (b) The space is partitioned into regions containing no more than a given number of points, here 3. (cc)For each region, a &XXis adjusted to the data depending on the points’ distribution in the region. (d) The real data is ‘forgotten’ and only boxes are considered.
42
A. Csillaghy l --
‘-a
.. . . . . . . . . . . . . . . . . . . . . . . .. . . . .. :. : : : : : : : : i . ?
cLw+
max(y)
NY)
I** CL(Y) .................................. l
center(y)
... min(y)
min(x)
center(x)
n
Fig. 4. A graphic representation of the parameters defining a box (inner) (outer). A box adapts better to the real distribution of the data (assuming onl because the variance does not react strongly to outsider points. /J(X) correspc and a(x) to Eq. (3).
a region cluster), 3 Eq. (2)
imposed on values defining a box and any of their combinations. In addition, a box occupancy can also be used. Figure 1 shows an original image and the corresponding icon. The size has been reduced from 56 FITS records to 3. To build this icon, the following criteria have been applied: (1) Background elimination: the gray scale value X3 of a box must be greater than 1. (2) The dimension of a box with the extensions a’, cr2, a3 must each be less than 10% of the whole dimension range. (3) Only the 400 boxes with smallest radii, I1< x1, x2 > II, are displayed. At the present time, criteria and thresholds used for selection have been chosen empirically. They are application-dependent and rely strongly on what should be enhanced in a data set. Further work is currently underway to develop a selection rule based on a systematic measure of the information. 3.4. Image Icons on the WWW Image icons have been historically developed independently from the World-Wide Web (since the latter did not exist). The original idea was to implement a visual menu interface to data analysis software in order to access directly the whole catalog of observations of the RAG visually (see below), i.e. by browsing icons instead of searching through a relational database without knowledge about the information content of the images. Image icons are suited to replace plain image browsing on the WWW Icons considerably limit the number of bytes which must be sent, relative to the information content, and therefore speed up the search for information. Moreover, icons encode the original data so that browsing can be authorized anonymously, and original data access is protected.
Information Access in an On-Line Image Archive
43
4. AN IMAGE BROWSER TO ACCESS SOLAR RADIO SPECTROGRAMS 4.1. Spectrograms: A Good Test-Bed for Image Browsing
In solar astronomy, spectrograms are used to evidence different types of radio emission correlating with flare energy release in the solar corona and therefore help to understand its heating mechanisms [2]. A spectrogram has been shown in Fig. 1. A collection of different events are accessible on-line. I Spectrograms, although viewed as images, have coordinates in time and frequency. Color translates the flux density into solar flux units. The archive of digital spectrograms recorded by two spectrometers, Ikarus [13] and Phoenix [3] during the past 20 years, contains about 10 GBytes of data. Observations were originally stored on magnetic tapes in an internal format. Raw data records are currently being calibrated and transformed into FITS [15]. Files are then stored on magneto-optical disks. In order to allow on-line access of the whole archive, it will eventually be transferred on CD-ROMs. The spectrometer is currently being upgraded for a better time/frequency resolution. The next operation phase will be directed towards continuous observations and will significantly increase the available data. At this stage, efficient browsing will be unavoidable. 4.2. WWW interface to the Archive The browser consists of a query form, an icon browser and a file identification card. Although the WWW interface is discussed here, it would be worthwhile to skip this section of the article and try it for yourself * instead. The query form allows specification of the search range. Further development will allow improved specification of the search range, notably selecting similar icons [6]. Queries are done on a relational database containing general information about the observations, such as start time, observed frequencies, etc. When submitting the query, the program checks whether there is an associated icon or not. In the positive case, the icon is included in the newly created HTML page. The icon browser is generated automatically from a script managing the output of the relational query. The generated page includes anchors to frequency programs and to a larger view of the icon, and the event identification card. The event identification card displays an enlarged view of an icon. However, the original data is not loaded, allowing it to be kept in a restricted area. The file identification card contains, if accessible, the FITS file header of the original data. If the original data is accessible online, and if the user is authorized, the original file can be downloaded to the remote host.
5. CONCLUSIONS
AND OUTLOOK
We have shown a way to reduce the size of images to efficiently implement on-line browsing of a large image database. Icons are produced by considering pixel similarity in their parameter space. By applying constraints on regions in the parameter space, the image is reduced to a smaller representation, enhancing regions of importance in the observations and neglecting others, such as background. 1 http://mimas.ethz.ch/IKARUS-PHOENIX/figures/catalogue-figures.html * http://mimas.ethz.chIKARUS-PHOENWwaw.html
44
A. Csillaghy
A WWW browser is adequate for the implementation of an icon browser, allowing internal and external access to the image database and therefore limiting the development time of user interfaces. At this stage, all icons are transmitted to the user interface, even those which contain no information. Further work will concentrate on automatically analyzing whether an icon contains information or not in order to send only the ‘interesting’ icons. To achieve this goal, the concept of information content will have to be defined more clearly. Another topic for development will be the creation of icons with better visual qualities, by using a methodical approach, based on analysis of the information content.
Acknowledgements I acknowledge helpful 0. Benz, S. Krucker and carried out with the help project is partly supported
discussions with A. 0. Benz and H. Hinterberger. I thank also A. K. A. Meier for proof-reading the manuscript. Spectrograms were of all the members of the Radio Astronomy Group in Zurich. This by the Swiss National Science Foundation, Grant Nr. 20-040336.94.
References [l] Banks, S. (1990) Signal Processing, Image Processing and Pattern Recognition. Prentice Hall, New York. [2] Benz, A. 0. (1993) Plasma Astrophysics, Kinetic Processes in Solar and Stellar Corona. Kluwer, Dordrecht. [3] Benz, A. O., Giidel, M., Isliker, H., Miszkowicz, S. and Stehling, W. (1991) A broadband spectrometer for decimetric and microwave radio bursts: First results.Sol. Phys. 133, 385-393.http://mimas.ethz.ch/papers/benz/phoenix/firstres.la/firstres.la.html. [4] Chui,
C. H. (1992) Wavelet Analysis and its Applications. Academic Press, New York. [5] Csillaghy, A. (1994) Building image icons for fast browsing and classification Proceedings of the First International of solar radio spectrograms, Conference on Image Processing. Los Alamitos, IEEE Computer Society Press. http://mimas.ethz.ch/papers/csillaghy/austin.ps. [6] Csillaghy,
A. (1995) Retrieving information from solar radio spectrograms. In: Coronal Magnetic Energy Releases, A. 0. Benz and A. Kruger (eds). Springer, Berlin. http://mimas.ethz.chlpapers/csillaghy/cesrahtm~~sra94-header.ps. [7] Fullton, J. (1994) Distributed astronomical data archives. In: Astronomical Data Analysis Software and Systems HZ, D. R. Crabtree, R. J. Hanisch and J. Barnes (eds), Volume 61, pp. 3-9. ASP Conference Series. [8] Hinterberger, H. (1987) Data Density: A Powerful Abstraction to Manage and Analyze Multivariate Data, PhD thesis, Swiss Federal Institute of Technology, Zurich, 1987. [9] Hinterberger, H. (1994) Getting Out of the Bitmap and info the Picture to Manage Visualized Spatial Information, Working paper, Institute for Scientific Computing, ETH Zurich. [lo] Murtagh, F. (1994) Computer networking in astronomy. In: Databases & Online Data in Astronomy II, M. Albrecht and D. Egret (eds). Kluwer, Dodrecht. http://http.hq.eso.org/-fmurtagh/papers/albrecht-egret-chap.ps.
Information Access in an On-Line Image Archive [ 1l] Murtagh,
for
textual
Fourteenth
F.
(1994)
data,
new
International
45
Content-based information retrieval: New tools problems for image data, Proceedings of the CODATA Conference, Chambery, September 1994.
http://http.hq.eso.org/-fmurtagh/papers/image-retrieval-codataS4.ps.
[12] Nievergelt, J., Hinterberger, H. and Sevcik, K. (1984) The grid file: An adaptable, symmetric multikey file structure. ACM Transactions on Database systems 9(l), 35The computer-controlled solar radio spectrometer [13] Errenoud, M. (1982) “IKARUS”. Sol. Phys. 81, 197-203. [14] Samet, H. (1990) The Design and Analysis of Spatial Data Structures. AddisonWesley, Reading, MA. [15] Wells, D. C., Greisen, E. W. and Harten, R. H. (1981) FITS: A Flexible Image Transport System. A&AS 44, 363-370.