Integrated access to distributed data and information services in astrophysics and the space sciences

Integrated access to distributed data and information services in astrophysics and the space sciences

Computer Physics Communications 127 (2000) 177–187 www.elsevier.nl/locate/cpc Integrated access to distributed data and information services in astro...

230KB Sizes 1 Downloads 17 Views

Computer Physics Communications 127 (2000) 177–187 www.elsevier.nl/locate/cpc

Integrated access to distributed data and information services in astrophysics and the space sciences R.J. Hanisch 1 Space Telescope Science Institute, 3700 San Martin Drive, Baltimore, MD 21218, USA

Abstract Scientific advances do not necessarily follow strict research discipline boundaries. In the area of astronomy and space science, data from multiple missions and observatories operating in various parts of electromagnetic spectrum are necessary in order to answer fundamental scientific questions. However, a researcher attempting to locate, understand, and use data from a variety of sources now faces serious difficulties. Many datasets are available on the Internet but finding the ones of relevance, especially outside of one’s immediate field of expertise, is difficult. The metadata used to annotate datasets in different fields can be unfamiliar and obscure, even though the same basic data attributes are being described. Once data has been located, it is often in different and incompatible formats. We are now developing a distributed information service for the space sciences – ISAIA – which covers many different subdisciplines with datasets of common interest and which will improve the researcher’s abilities to locate and use data from a wide variety of on-line resources. This service builds upon experience in implementing a data location service for astronomy, Astrobrowse. The key to implementing such services is the concept of profiles. Profiles are a generalization of data dictionaries: they define the metadata labels and content for information resources and provide mappings of those labels onto site or servicespecific terms and query protocols. In ISAIA profiles may be hierarchical – general at the highest levels, with sub-profiles for certain disciplines or types of instrumentation. In practical terms, a distributed system such as ISAIA will only succeed if existing data providers can make their systems and services available to it with a minimum amount of effort. ISAIA must be “light weight”, imposing no constraints on how an organization manages its data services internally. It must support participation at different levels, from simple registration of a service to full support for query and response interpretation. In addition, ISAIA must preserve the identity and visibility of the participating services. This is important for end-users (who must understand the provenance of the data they are using) and the providers (who need to understand their users’ needs and the demands upon their resources).  2000 Elsevier Science B.V. All rights reserved. Keywords: Information systems; Information services; Data archives; WWW

1. Current information services in astrophysics and the space sciences and the need for integrated access

1 Professeur Invité, Observatoire Astronomique de Strasbourg, 11

rue de l’Université, 67000 Strasbourg, France.

The field of astronomy and astrophysics, and the space sciences more generally, enjoys a wealth of electronic data and information services. The dis-

0010-4655/00/$ – see front matter  2000 Elsevier Science B.V. All rights reserved. PII: S 0 0 1 0 - 4 6 5 5 ( 9 9 ) 0 0 5 0 7 - X

178

R.J. Hanisch / Computer Physics Communications 127 (2000) 177–187

cipline of astronomy and astrophysics, in particular, has been a leader in the development of online data services. NASA initiated an electronic Astrophysics Data System in 1987 [16], a facility that now focuses on bibliographic references and abstracts for all major astronomy publications ([6], http://adswww.harvard.edu). NASA also supports an Astronomical Data Center (ADC) at Goddard Space Flight Center (http://adc.gsfc.nasa.gov/), and in Europe the Centre de Données astronomiques de Strasbourg (CDS) (http://cdsweb.u-strasbg.fr/CDS.html) provides the SIMBAD (Set of Identifications, Measurements, and Bibliography for Astronomical Data) database of more than 2,200,000 astronomical objects having more than 5,500,000 identifiers, 100,000 bibliographical references, and nearly 3,000,000 bibliographic citations. The CDS and ADC also provide access to a large collection of astronomical catalogs and have WWW-based services for browsing and selecting objects from these catalogs. In addition to these catalog and bibliographic services, the astrophysics community has many archives of observational data available on- or near-line. These include the major NASA data centers: the Infrared Processing and Analysis Center (IPAC), which operates the Infrared Science Archive (IRSA) (http://irsa. ipac.caltech.edu) and the NASA Extragalactic Database (NED) (http://nedwww.ipac.cattech.edu); the Space Telescope Science Institute (STScI), which hosts the Hubble Space Telescope archive (http://archive.stsci.edu) and optical/UV Multimission Archive at Space Telescope (MAST, http://archive.stsci.edu/ mast.html); and the High Energy Astrophysics Science Archive Research Center (HEASARC) at NASA/Goddard Space Flight Center (http://heasarc.gsfc.nasa. gov). NASA’s National Space Science Data Center (NSSDC) (http://nssdc.gsfc.nasa.gov) provides permanent archiving support for the front-line archive centers as well as direct access to several key astronomical data sets, such as the Cosmic Background Explorer archive. There are many other major astronomical observatories and archives available via the web, including data from a number of ground-based telescopes. The discipline of astronomy has been at the forefront of electronic publishing as well, with a full suite of on-line electronic journals and tabular data from those journals available from several large data centers (cf. [4]). Good starting points for brows-

ing on-line astronomical resources are the AstroWeb (http://cdsweb.u-strasbg.fr/astroweb.html) and the review of on-line astronomical resources by Andernach et al. [3], updated by Andernach [2]. Other areas of space science, notably Planetary Science and Solar and Space Physics, also have numerous on-line resources. NASA’s Planetary Data System (PDS) maintains and distributes data from NASA planetary missions (http://pds.jpl.nasa.gov/). PDS is composed of a central node and a number of topical nodes, each of which takes responsibility for data in a different area of expertise. The Space Physics Data System – SPDS – is a collection of online resources from numerous space physics missions (http://spds.gsfc.nasa.gov/). The NSSDC is also a major provider of on-line data sets and information services for space physics. Solar physics data can be found at the Solar Data Analysis Center (SDAC) at Goddard Space Flight Center (http://umbra.nascom. nasa.gov/sdac.html) and at sites referenced there. Even though there are many excellent sources of space science data available on the web, there are substantial barriers to the interoperability of these data: • Some important data sets, especially those at high time resolution, remain unavailable in electronic form. • Data are in incompatible formats. • Data frequently have inconsistent or incomplete metadata, and each discipline tends to have its own metadata vocabulary. • Good data may be available, but it can be difficult to find for a given object, class of objects, or type of phenomenon. It is important to try to remove, or at least reduce, these barriers in order to realize the maximum scientific return from these data and to enable new, crosscutting research using data from traditionally independent disciplines. For example, research on comets requires data from all space science disciplines: groundbased and space-based imaging and spectroscopy, in situ measurements of the solar wind (wind speed, density, and magnetic field strength), and in situ measurements from space probes targeted at comets (Giotto, the International Cometary Explorer, future missions) in order to build a complete model. Similarly, investigation of auroral phenomena around other planets requires information about low-frequency radio emis-

R.J. Hanisch / Computer Physics Communications 127 (2000) 177–187

sions, data from UV imaging and spectroscopy, planetary magnetosphere data, and solar wind data. On the other hand, expertise in data is frequently associated with a specific wavelength region or phenomenon, as the data collection apparatus, calibration processes, and analysis procedures are connected to the physics of the measurement itself. Any information service that provides integrated access to data of diverse origins must also provide links to the documentation needed to understand and use the data properly and to the scientists having direct experience working with the data. In this paper I describe concepts and plans for building an integrated service for accessing distributed information services, first in astronomy and astrophysics, and then generally to space science. Although the implementation of this service will necessarily be specific to these disciplines, the general architecture is one which pertains to any set of related disciplines in which information is heterogeneous in nature and physically distributed.

179

be familiar with all of the sites that might contain such data, and then go to them one by one to submit a query for the object of interest. The exact form of these queries is likely to differ from site to site and service to service, even though essentially the same type of information is being specified. Query agents: “Better”. A query agent is a web service that knows about a class of information resources and their general attributes, i.e., what type of information is available from each resource. For example, in astronomy a query agent would know what types of data (images, spectra, time series, catalogs) in what spectral bandpasses (X-ray, ultraviolet, optical, infrared, radio) are available in each service. The user interested in ultraviolet spectra of certain objects would have queries directed to just those sites holding such data. The query agent passes the user’s query along to each site, formatting it as required. The user then interacts further, site by site, to refine the query and request specific data sets.

2. Models for integrated access I like to characterize three different models for integrated access to distributed information sources as “good”, “better”, and “best”. These are distinguished by how much work the end-user has to do to find and use data of interest. These are shown schematically in Fig. 1. The goal is to provide a “best” system for integrated access to space science data. The World Wide Web: “Good”. The situation that generally exists in space science today is certainly “good”. As mentioned above, there are literally thousands of on-line resources available. Discipline-oriented document and resource collections such as AstroWeb and SPDS allow researchers to browse these services, though the researcher may be faced with searching many tens of sites in order to find data of interest. An example from astronomy highlights the shortcomings of this situation: suppose someone wants to find images in many parts of the spectrum (from radio to X-ray) of an object they have just observed with the Hubble Space Telescope in order to make comparisons between features in different bandpasses. In general this researcher would first have to

Fig. 1. Three models for integrated access to distributed information resources. Left: The user interacts with each information resource individually. The current WWW, or discipline-specific collections of WWW resources (AstroWeb, SPDS, etc.). Middle: The user sends a single query for information to the query agent. The agent knows what resources are available and what kind of information they contain, and forwards the query only to the appropriate resources. The user then interacts with the resources one by one. Right: A query/response agent both knows what resources are available and how responses from those resources are formatted. It is possible to reformat the responses into a uniform presentation for the user.

180

R.J. Hanisch / Computer Physics Communications 127 (2000) 177–187

Query/response agents: “Best”. A query/response agent is a web service that provides all of the services of a query agent, but in addition is able to collect the responses from the information services and integrate them together for presentation to the user. This integration may require conversion of metadata from one set of units to another and other reformatting. The query/response agent presents the user with metadata from and links to the data at various distributed services. In principle the query/response agent could also become a data agent, collecting data from various services, converting data to a user-specified format, and even performing such complex functions as spatial or spectral rebinning to facilitate comparison. However, even within the discipline of astronomy and astrophysics, data is taken in many different ways, using instruments of widely varying intrinsic resolution and sensivity. As a result, automated conversions and comparisons are likely to be meaningless. It is critical for the user to understand the source and “pedigree” for the data in order to make proper use of it, and in order to know who to contact for further information about it. Thus, it is not clear that a general data agent function is either practical or in the best interest of the user. Data agents for specific sub-fields (e.g., CCD imaging) could be developed, though care would still need to be taken in checking for comparable spatial resolution.

3. The key to integration: profiles To bridge the differences between the various information services and databases, we require a common language for expressing queries that can at some stage be translated into the specific languages used by each of the databases. For a complex system, it is helpful to consider this language in two parts: the profile and the protocol. The profile defines the concepts that can be expressed within queries and responses. The protocol defines how the concepts are represented and passed between the clients that pose the queries and the databases that answer them. This division is advantageous because the protocol is closely tied to the technology that is used to implement the system, while the profile can be made technology independent and may be implemented with many different technologies.

The primary component of a profile is the definition of the set of standard query attributes – the concepts that can be used to state selection criteria. In the space sciences these attributes will include concepts such as sky position or location in space, frequency or band-passes, data type, time of observation, mission or observatory, and other key identifiers. The profile also defines the operations that a user may invoke on these criteria, such as “equals” and “greater than”. It is sometimes appropriate to define a controlled vocabulary associated with an attribute; for example, recognized data types might include “image”, “catalog entry”, “reference”, “spectrum”, etc. The profile can also define Boolean operators and extended options that allow one to express complex queries. In defining the profile one must maintain a balance between making the profile descriptive enough to permit effective searches but minimizing the burden on the data providers who must provide these profiles. For example, the Z39.50 BIB-1 profile used in bibliographic work describes over 100 concepts that are included in the profile (ftp://ftp.funet.fi/pub/doc/library/ z3950/bib1/bib1sem.txt). Such a complex profile will strongly discourage the space science community from participating in an integrated data system. A way to approach the profile definition is to divide the concept space into sub-profiles. At the highest level, a profile with a limited number of terms common to all of space sciences would be used to identify primary resources. Sub-profiles could become increasingly specific and detailed as they map onto sub-disciplines, or, at the lowest levels, onto individual missions or data sets. A hierarchical organization of profiles allows cross-discipline or cross-concept searching. The implementation cost to data providers can be minimized as they can choose which profiles to support based on which are applicable to their databases. Hierarchical profiles provide the flexibility for searching across all of space science and the extensibility for focusing on details that may apply only to certain sub-disciplines. For example, a researcher could ask the question, “what data exists about the magnetosphere of Jupiter?”, and send the query to all databases within space science. Such a question would be expressed using a general, top-level profile that includes the attribute object name which all data-

R.J. Hanisch / Computer Physics Communications 127 (2000) 177–187

181

bases across space science might support in some way. A more specific question, such as “What X-ray data can be found within some radius of a given celestial position?”, might involve attributes of a more specific profile, like celestial coordinates, and therefore could be sent to only those databases (in this case, celestial coordinate based astrophysical databases) to which the attributes make sense. Even smaller subsets of databases could support highly detailed profiles to support very specific search and retrieval capabilities. There are three profile components: resource profiles, query profiles, and response profiles.

service providers could choose to support only the resource profile and the query profile. This would make their data services visible and queryable in an integrated system, but would not allow their query responses to be handled automatically. Further information on profiles is given by Plante et al. [14] and Plante [13]. Example profile terms are shown in Table 1. Note that these terms are intended to be illustrative only. A formal profile definition will utilize a structured and strictly controlled vocabulary, both for field names and field values.

Resource profiles characterize the data holdings in a given service. The primary purpose of the resource profile is to allow the query agent to determine which sites and services to send queries to. The resource profile functions as a filter and avoids having queries sent across the network to possibly hundreds or thousands of services which do not have data of interest to the user.

4. Communication protocols, metadata interchange standards

Query profiles provide a detailed mapping of generic query terms to site or service specific terms. For example, in astrophysics the celestial coordinates of right ascension and declination are fundamental for giving the position of an object on the sky. One service may label these coordinates as “RA” and “DEC” in its database, and another may label them “right_ascension” and “declination”. They may also be expressed in different units in different databases (decimal degrees, radians, or character strings in sexagesimal notation: hh:mm:ss.s, dd:mm:ss.s). In space physics data, time is often a key search criterion. Different resources specify observation times in different systems, units, and notations. Response profiles label the metadata that comes with a query response in order to facilitate integration of the results from different services. Response profiles are also used to define the type of information being returned (plain text, HTML, or other structured records). Although the term definitions in each of these profiles are likely to be the same or similar, it is useful to make the distinction between these from the perspective of implementation. For example, information

Agreeing upon the profiles for information resources, how those resources are queried, and how metadata is returned, are the first steps in actually developing a distributed information service. The next step is to implement the profiles within the framework of a communications protocol and with a format supported within the space science community. The obvious protocol for such a system is HTTP, and as will be described later a system for data discovery in astrophysics has already been implemented using HTTP streams interfaces. Z39.50 is a well-supported query transport protocol that is used widely in the library community, and is a protocol that has been recommended for use in the astronomy community [8], but it has not yet gained acceptance in astronomy or space science in general. More complex profiles and metadata could be expressed in the FITS format (see http://fits.gsfc.nasa.gov/ and references given there), which is a standard throughout the astronomy community, or in one of the several formats widely used in the space physics community (CDF, netCDF, SFDU, HDF, etc.). The selection of a discipline-specific format is likely to inhibit broad participation in a distributed service, however. The Extensible Markup Language, XML (http: //www.w3.org/XML/) is poised to become the new lingua franca of the Internet. Using XML one can define both the structure and content of a document, and for our purposes that document can be a resource, query, or response profile. Within the astronomy community considerable work is already being done on XML:

182

R.J. Hanisch / Computer Physics Communications 127 (2000) 177–187

Table 1 Sample resource profile for space sciences. Terms in the profile at this level are intended to pertain to space science data sets in general. Additional terms are introduced in subprofiles for specific disciplines, allowing users both broad and deep access to space science resources. Data attributes are fields which, when Data attributes: FACILITY

name of observatory, mission, program, etc.

DISCIPLINE

astronomy, space physics, planetary science, solar physics

INSTRUMENT HOST

name of telescope (HST, IUE, COBE, . . .)

INSTRUMENT NAME

name of instrument (WFPC, NICMOS, FIRAS, . . .)

INSTRUMENT TYPE

magnetometer, spectrometer, imager, photometer, . . .

OBSERVED PHYSICAL QUANTITY

photon, electron, proton, ion, atom, molecule, magnetic field, electric field, pressure, temperature, . . .

SAMPLING MODE

time series, image, aperture, spectrum, visibility, scan, . . .

DATA CLASS

pointed observation, survey observation, derived (catalog), simulation, model fit, ephemeris, software, literature

DATA FORMAT

FITS, CDF, PDS, HDF, ASCII, . . .

TIME SPAN

range of times covered by resource

PRINCIPLE INVESTIGATOR

name of PI for INSTRUMENT NAME

OBJECT NAME

astronomical object name, planet name, region of space

ENERGY RANGE

optical, UV, IR, 2–10 keV, . . .

Resource attributes: RESOURCE NAME

name of the data resource described by this profile

DATA SERVICE SITE

name of the data center providing the data

CURATOR

name of person or organization responsible for knowledge of the data

RESOURCE URL

URL for accessing the resource for end-users

SUB-PROFILES

identifiers for sub-profiles supported by this resource

• The Astronomical Data Center at NASA/Goddard Space Flight Center has developed a draft standard for annotation of astronomical catalogs and is leading a project to standardize the mark-up of tabular data in astronomical journals ([15], and http://adc.gsfc.nasa.gov/xml/). • A draft Astronomical Markup Language, AML, has been developed ([12], and http://www.infm.ulst.ac. uk/∼ damien/these/). • The Astronomical Instrument Markup Language, AIML, is being developed to support the control astronomical instrumentation and subsequent data

processing ([1], and http://pioneer.gsfc.nasa.gov/ public/aiml/). • The Gemini 8-meter Telescope Project is using XML for processing of observing proposals (ftp: //ftp.gemini.edu/pub/gemini_Central/OCS/phase1-1 dec98.tar.Z).

5. A distributed database of resources and their associated query and response profiles The success of the Internet is largely a result of having information about network nodes – IP

R.J. Hanisch / Computer Physics Communications 127 (2000) 177–187

addresses and their name equivalents – distributed to a more-or-less hierarchical organization of name servers. Similarly, the information describing space science data services needs to be managed in a distributed manner. The resource, query, and response profiles should be maintained by the sites responsible for providing the corresponding services, yet the profiles themselves must be widely available in order to facilitate efficient queries, even when particular sites or services are temporarily unavailable. The Générateur des Liens Uniformes (GLU) protocols were developed at the CDS to provide essentially a URN service for Web pages ([7], http://simbad.ustrasbg.fr/glu/glu.htx). However in developing the system the CDS considered not only the issues of how to provide a service which resolves names to URLs, but also how to describe what a given URL does. Thus GLU allows one to indicate that a given resource exists, that it takes parameters of given types, and to add any descriptive information about the resource that the user desires. The GLU system is designed to be robust in the face of uncertain connections in the Web. For example, GLU manages mirror sites (sites where certain data services are duplicated for ease of access) and regularly checks connections and directs the links to the ‘closest’ (e.g., providing best response) mirror, determined from these checks. Using the GLU, sites describe resources that they provide (or resources they know of from non-GLU sites) and the GLU network assures that all of the participating sites have an up to date record of these descriptions. There are no master sites and only updates are deferred if connectivity is temporarily broken; the system continues to work with the last received data. An example GLU configuration is shown in Fig 2. Work has now begun at the CDS to support XML-based resource descriptions in the GLU system.

6. A first generation query agent: Astrobrowse The concepts described here have been demonstrated, at least in part, in a distributed data location service for astronomy: Astrobrowse. Astrobrowse was conceived by Steve Murray and myself in 1995, and we organized a workshop of astronomy information providers to flesh out the concept and pro-

183

Fig. 2. A distributed GLU database. Four sites are contributing to the database and six receiving information. GLU is configurable such that a site need not maintain elements of the database that are not of interest (e.g., here SAO does not subscribe to planetary data). In a distributed information service for the space sciences information providers would each contribute to the GLU database, but there could be many more sites subscribing to the database for the purpose of locating information services.

totype the system (http://hea-www.harvard.edu/adccc/ astrobrowse_wp.html). Astrobrowse is an implementation of the query agent (the “better” model for a distributed information service) using a combined and somewhat limited resource/query profile. Some 1000 astronomical resources are known to Astrobrowse, and their characteristics and query mechanisms are documented in GLU records for each resource. There are three fully functional user interfaces for the Astrobrowse system: http://legacy.gsfc.nasa.gov/ab [11,9], http://simbad.u-strasbg.fr/glu/cgi-bin/astroglu.pl [5], and http://archive.stsci.edu/starcast. Each is layered on the same distributed GLU database of astronomical resources. Astrobrowse does not have a strictly defined vocabulary for the resource profiles, a deficiency which

184

R.J. Hanisch / Computer Physics Communications 127 (2000) 177–187

causes the user interface builder to have to encode knowledge of the nature of the resources into the interface. Also, Astrobrowse implements essentially only one component of a query profile: celestial coordinates (right ascension and declination). Since the majority of astronomical information services rely on searching by target position, this is not as big a problem as it might seem. Searches by object name are notoriously unreliable; astronomers have many systems of nomenclature with the result that the same object can have many different names. Thus, despite the rather restricted nature of the Astrobrowse implementation, it demonstrates that a data location service can be very successful with even a limited query vocabulary. Astrobrowse also demonstrates the ability to transform search attributes, in this case celestial coordinates, into a number of different representations – the Astrobrowse query agent converts its standard coordinate format into the various formats required by the participating services (including change of epoch and equinox). Within the planetary sciences community a similar service, actually more complete than Astrobrowse, has also been implemented. The Planetary Data System has developed a Distributed Inventory System (DIS) for its data holdings [10]. This has been possible because all PDS data sets utilize the same data dictionary, a situation not enjoyed in astronomy or solar and space physics. The user interface to DIS, PDSBrowse (http://pds.jpl.nasa.gov/pdsbrows.htm), allows users to locate data throughout the distributed PDS databases with a single query.

7. A query/response agent: ISAIA We have begun work on a successor to Astrobrowse, known as ISAIA (Interoperable Systems for Archival Information Access). ISAIA will operate as a query/response agent, performing the tasks of Astrobrowse for locating data services, but also handling the responses from those data services. ISAIA will also be broader in scope than Astrobrowse, encompassing information services in the space sciences and complementary sources of data from ground-based facilities. Wherever possible ISAIA will be layered upon existing systems, using resource, query, and response pro-

files to build a common interface. ISAIA development will consist of four major phases: (1) definition of the profiles, (2) implementation of a generalized query agent, (3) implementation of a query/response agent, and (4) development of an integrator. The goal of the ISAIA program is to build systems which can integrate results from queries against many different resources and to allow remote resources to appear to be local to a given user. ISAIA’s data integrator must regularize the information returned in response to queries, ensuring consistency in units, projections, formats, etc. The integrator must maintain links so that the user can understand the provenance of the data. The role of the ISAIA integrator is to enable use of results from multiple sources effectively. Using information in the profiles with the returned results, the integrator must be able to: 1. Convert units, e.g., to convert from wavelength to frequency. 2. Translate coordinate systems, e.g., convert from astronomical Galactic to Equatorial coordinates or from heliocentric to geocentric positions. 3. Regularize formats. Multiple tables often display the same type of metadata in different formats – possibly in several different fields. An example is time, which may be broken into day and time of day in one table but provided in a single field in another. 4. Reorganize tables. Even when tables provide the same information, comparisons can be difficult if the data are sorted in different order or if the columns to be compared are not conveniently highlighted. 5. Store intermediate results. Scientists must be able to interrupt and return to sessions without having to reenter the queries that established their current state. 6. Cross-correlate tables. In many cases the role of the integrator is to hide the existence of multiple tables from the user. Users can ask questions of the form “what targets have been observed by both X and Y?” or “what targets have been observed by X and not by Y?”. 7. Transparently access archival data. While the integrator itself works primarily with the tables and metadata that index archives, it must also provide

R.J. Hanisch / Computer Physics Communications 127 (2000) 177–187

users with convenient access to archive information which corresponds to the rows that users select. Users will be able to see at least minimal description of the kinds of archive information available. Where they exist, quick look data products such as GIFs or other kinds of preview images, will be highlighted. 8. Give explicit credit to the data provider and maintain data integrity. The system must indicate when the metadata are reprocessed by the integrator and must allow users to access data directly from the original provider. The integrator is the controller that takes user’s requests, queries remote sites, translates the results and feeds a local database. It sends outputs to the user interface for display to the user. The integrator can also take data from the local database and invert the process, providing ISAIA-compatible records. Initially the integrator provides the user interface with descriptions of the options available which the integrator determines by examining the GLU database of space science resources. After the user submits the request and the integrator has marshaled the results, the integrator provides the user interface with descriptions of the results as well as the regularized metadata. The integrator incorporates a parser which analyzes the profile metadata and determines the steps needed to transform the incoming data streams into the formats desired by the user. The integrator uses an internal database system to manage state, to perform cross-correlations among results obtained from multiple sources, and to provide storage for intermediate results. Many of these capabilities can easily be added to existing archival systems but there are two major issues that need to be resolved. First, existing systems must be adapted to the metadata protocols used within ISAIA. In some cases the existing system may not supply all of the information expected by ISAIA, and a graceful fall-through will be needed (e.g., by either ignoring records for which there is insufficient information, or making queries on the basis of the metadata which is present). Second, most current Web interfaces do not provide mechanisms for the ingest and storage of user generated results. For ISAIA to be fully effective, tables generated by ISAIA need to be treated in the same fashion as native tables. Neither of these problems is terribly difficult to solve, but they

185

will need to be addressed in the local context of each existing system. We hope that new systems will be built with ISAIA-compatibility built in ab initio. 8. Conclusions The most profound change that placing data on the Web has made is not the vast bulk of information that is there. Rather we can now treat data as an imminent quantity, ever-present and permeating the entire scientific community. With the development of systems like ISAIA (and Astrobrowse before it), scientists should need to do little more than think of what they are interested in in order to have it presented before them. With ISAIA the Web becomes a large information system whose contents are tailored for individual users. ISAIA will organize this information in a way that allows the user to concentrate on scientific problems rather than on the details of where information is, or what format it is in. Acknowledgments This paper draws upon the experience and knowledge of all members of the ISAIA project team: T. McGlynn, C. Heikkila, and N. White (NASA/GSFC, HEASARC), J. King (NASA/GSFC, NSSDC), R.A. White, C. Cheung, and E. Shaya (NASA/GSFC, ADC), R. Plante and R. McGrath (NCSA/UIUC), J. Mazzarella (IPAC/Caltech), A. Rots (SAO), S. McMahon and S. Hughes (JPL/PDS), M. A’Hearn (U.Md.), R. Beebe (NMSU), F. Genova (CDS), and P. Giommi (BSDC). The work on ISAIA profiles has benefited greatly from discussions with colleagues at the Observatoire Astronomique de Strasbourg, Centre de Données astronomiques de Strasbourg (CDS) during my visit there in the summer of 1999: F. Ochsenbein, M. Wenger, P. Fernique, P. Dubois, and F. Genova. I am most appreciative of the support of the CDS and OAS during this period, and thank CDS director F. Genova and OAS director D. Egret for their hospitality. The ISAIA project is supported by NASA’s Applied Information Systems Research Program under grants to the Space Telescope Science Institute, the National Center for Supercomputing Applications/University of Illinois, and the Goddard Space Flight Center.

186

R.J. Hanisch / Computer Physics Communications 127 (2000) 177–187

Acronyms

NICMOS Astronomical Data Center (NASA/GSFC) Astronomical Instrument Markup Language Astronomical Markup Language Common Data Format Centre de Données astronomiques de Strasbourg

NSSDC

COBE DIS

Cosmic Background Explorer Distributed Inventory System

SFDU

FIRAS

Far Infrared Absolute Spectrophontometer Flexible Image Transport System

References

ADC AIML AML CDF CDS

FITS GLU GSFC HDF

Générateur des Liens Uniformes Goddard Space Flight Center (NASA; Greenbelt, Maryland) Hierarchical Data Format (NCSA)

HEASARC

High Energy Astrophysics Science Archive Research Center (NASA/GSFC)

HST

Hubble Space Telescope

HTML HTTP

Hypertext Markup Language Hypertext Transfer Protocol

IPAC

Infrared Processing and Analysis Center (Caltech)

IR

infrared

IRSA ISAIA

Infrared Science Archive Interoperable Systems for Archival Information Access International Ultraviolet Explorer

IUE JPL

Jet Propulsion Laboratory (Pasadena, California)

MAST

Multimission Archive at Space Telescope (STScI) National Aeronautics and Space Administration National Center for Supercomputing Applications NASA Extragalactic Database (IPAC)

NASA NCSA NED

PDS PI SAO SDAC

Near-Infrared Camera and MultiObject Spectrograph National Space Science Data Center (NASA/GSFC) Planetary Data System (NASA/JPL) principal investigator Smithsonian Astrophysical Observatory Solar Data Analysis Center (NASA/GSFC) Standard Formatted Data Unit

[1] T.J. Ames, K.B. Sall, C.E. Warsaw, NASA’s Instrument Control Markup Language (ICML), in: Astron. Soc. Pacific Conf. Ser. 172, Astronomical Data Analysis Software and Systems VIII, D.M. Mehringer, R.L. Plante, D.A. Roberts (Eds.), 1999, pp. 103–106. [2] H. Andernach, Internet services for professional astronomy, in: Astrophysics with Large Databases in the Internet Age, Proc. 9th Canary Islands Winter School on Astrophysics, M. Kidger, I. Pérez-Fournon, F. Sánchez (Eds.) (Cambridge University Press, 1988). Preprint: http://xxx.lanl.gov/abs/astroph/9807167/. [3] H. Andernach, R.J. Hanisch, F. Murtagh, Network resources for astronomers, Pub. Astron. Soc. Pacific 106 (1994) 1190– 1216. [4] P.B. Boyce, H. Dalterio, Electronic publishing of scientific journals, Physics Today (January 1996). [5] D. Egret, P. Fernique, F. Genova, Prototype of a discovery tool for querying heterogeneous services, in: Astron. Soc. Pacific Conf. Ser. 145, Astronomical Data Analysis Software and Systems VII, R. Albrecht, R.N. Hook, H.A. Bushouse (Eds.), 1998, pp. 416–419. http://ecf.hq. eso.org/adass/adassVII/egretd1.html. [6] G. Eichhorn, An overview of the astrophysics data system, Experimental Astronomy 5 (1994) 205–220. [7] P. Fernique, F. Ochsenbein, M. Wenger, CDS GLU, a tool for managing heterogeneous distributed web services, in: Astron. Soc. Pacific Conf. Ser. 145, Astronomical Data Analysis Software and Systems VII, R. Albrecht, R.N. Hook, H.A. Bushouse (Eds.), 1998, pp. 466–469. http://ecf.hq.eso. org/adass/adassVII/ferniquep.html. [8] K. Gamiel, R. McGrath, R. Plante, Distributed searching of astronomical databases with Pizazz, in: Astron. Soc. Pacific Conf. Ser. 145, Astronomical Data Analysis Software and Systems VII, R. Albrecht, R.N. Hook, H.A. Bushouse (Eds.), 1998, pp. 375–377. http://ecf.hq.eso.org/ adass/adassVII/gamielk.html. [9] C.W. Heikkila, T.A. McGlynn, N.E. White, Astrobrowse: A web agent for querying astronomical databases, in: Astron.

R.J. Hanisch / Computer Physics Communications 127 (2000) 177–187 Soc. Pacific Conf. Ser. 172, Astronomical Data Analysis Software and Systems VIII, D.M. Mehrtinger, R.L. Plante, D.A. Roberts (Eds.), 1999, pp. 221–224. [10] J.S. Hughes, S.K. McMahon, The planetary data system – distributed inventory system, in: Proceedings, IEEE Forum on Research and Technical Advances in Digital Libraries, 1999. [11] T.A. McGlynn, N.E. White, Astrobrowse: A multi-site, multiwavelength service for locating astronomical resources on the web, in: Astron. Soc. Pacific Conf. Ser. 145, Astronomical Data Analysis Software and Systems VII, R. Albrecht, R.N. Hook, H.A. Bushouse (Eds.), 1998, pp. 481–484. http://ecf.hq.eso.org/adass/adassVII/mcglynnt.html. [12] F. Murtagh, D. Guillaume, Distributed information search and retrieval for astronomical resource discovery and data mining, in: Astron. Soc. Pacific Conf. Ser. 153, Library and Information Services in Astronomy III, U. Grothkopf, H. Andernach,

[13] [14]

[15]

[16]

187

S. Stevens-Rayburn, M. Gomez (Eds.), 1998, pp. 51–60. http: //www. eso.org/gen-fac/libraries/lisa3/murtaghf.html. R.L. Plante, A prototype profile set for astronomy, 1997. http://monet.astro.uiuc.edu/∼ rplante/topics/P30/profile.html. R.L. Plante, R.E. McGrath, J. Futrelle, A model for crossdatabase searching of distributed astronomical information resources, 1997. http://monet.astro.uiuc.edu/∼ rplante/topics/ P30/sysmodel.html. E.J. Shaya, J.H. Blackwell, J.E. Gass, V.E. Kargatis, G.L. Schneider, K.D. Borne, C.Y. Cheung, R.A. White, Formatting journal tables in XML at the ADC, in: Astron. Soc. Pacific Conf. Ser. 172, Astronomical Data Analysis Software and Systems VIII, D.M. Mehrtinger, R.L. Plante, D.A. Roberts (Eds.), 1999, pp. 274–277. G. Squibb, The astrophysics data system, Report of a NASA Workshop, September 1987.