Supporting users through integrated retrieval, processing, and distribution systems at the Land Processes Distributed Active Archive Center

Supporting users through integrated retrieval, processing, and distribution systems at the Land Processes Distributed Active Archive Center

Acta Astronautica 56 (2005) 681 – 687 www.elsevier.com/locate/actaastro Supporting users through integrated retrieval, processing, and distribution s...

106KB Sizes 2 Downloads 25 Views

Acta Astronautica 56 (2005) 681 – 687 www.elsevier.com/locate/actaastro

Supporting users through integrated retrieval, processing, and distribution systems at the Land Processes Distributed Active Archive Center夡 Thomas Kalvelage∗ , Jennifer Willems1 US Geological Survey EROS Data Center, Sioux Falls, South Dakota 57198 USA Received 14 June 2004; accepted 11 October 2004

Abstract The US Geological Survey’s EROS Data Center (EDC) hosts the Land Processes Distributed Active Archive Center (LP DAAC). The LP DAAC supports NASA’s Earth Observing System (EOS), which is a series of polar-orbiting and low inclination satellites for long-term global observations of the land surface, biosphere, solid Earth, atmosphere, and oceans. The EOS Data and Information Systems (EOSDIS) was designed to acquire, archive, manage and distribute Earth observation data to the broadest possible user community. The LP DAAC is one of four DAACs that utilize the EOSDIS Core System (ECS) to manage and archive their data. Since the ECS was originally designed, significant changes have taken place in technology, user expectations, and user requirements. Therefore the LP DAAC has implemented additional systems to meet the evolving needs of scientific users, tailored to an integrated working environment. These systems provide a wide variety of services to improve data access and to enhance data usability through subsampling, reformatting, and reprojection. These systems also support the wide breadth of products that are handled by the LP DAAC. The LP DAAC is the primary archive for the Landsat 7 Enhanced Thematic Mapper Plus (ETM+) data; it is the only facility in the United States that archives, processes, and distributes data from the Advanced Spaceborne Thermal Emission/Reflection Radiometer (ASTER) on NASA’s Terra spacecraft; and it is responsible for the archive and distribution of “land products” generated from data acquired by the Moderate Resolution Imaging Spectroradiometer (MODIS) on NASA’s Terra and Aqua satellites. © 2004 Published by Elsevier Ltd.

夡 Paper IAC-03-B.6.01 presented at the 54th International Astronautical Congress, 28 September–3 October 2003, Bremen, Germany. ∗ Corresponding author. Tel.: +1 605 594 6556; fax: +1 605 594 6906. E-mail addresses: [email protected] (T. Kalvelage), [email protected] (J. Willems). 1 Tel.: +1 605 594 2505; fax: +1 605 594 6906.

0094-5765/$ - see front matter © 2004 Published by Elsevier Ltd. doi:10.1016/j.actaastro.2004.10.009

1. Background The Land Processes Distributed Active Archive Center (LP DAAC) is funded by NASA to archive and distribute NASA land remotely sensed data and derived data products acquired by Enhanced Thematic Mapper Plus (ETM+), Advanced Spaceborne Thermal

682

T. Kalvelage, J. Willems / Acta Astronautica 56 (2005) 681 – 687

Emission and Reflection Radiometer (ASTER), and Moderate Resolution Imaging Spectroradiometer (MODIS) instruments on the Landsat 7, Terra, and Aqua Earth Observing System (EOS) satellites. The LP DAAC is one of four DAACs that utilize the EOS Data and Information System (EOSDIS) Core System (ECS) to manage and archive data gathered by EOS missions. The ECS was originally designed to ensure that the remotely sensed data acquired by NASA was easily available to the science and applications community. The design resulted in a large distributed data system. ECS was originally conceived during the 1980s, and was contracted out in the early 1990s. First operational use of the system was in support of the Landsat 7 mission at the LP DAAC in 1999. The initial system was designed to support the science teams with conducting instrument calibration and data validation. Consequently, user support systems were primarily designed for expert and experienced users, primarily the instrument science teams and interdisciplinary science investigators. Although government, academia, commercial, would also be allowed to use those systems, the learning curve involved with becoming expert users of the system was more intensive than anticipated. The searching and ordering of products from such a complex data system resulted in broad criticism of EOSDIS.

2. EOSDIS Core functionality Initially, the EOS Data Gateway (EDG) and the ASTER On Demand Form Request Manager (ODFRM), provided retrieval functionalities and on-demand processing. The EDG was the standard method for conducting searches and ordering all of the LP DAAC’s data sets, as well as over 1500 EOS data sets. The EDG is a menu-driven, search-oriented tool that is developed by NASA at the Goddard Space Flight Center (GSFC). The EDG is not only integrated with the EOS Core System, but also with a variety of other data systems. The EDG went operational at the LP DAAC in 1994. However, initially the capabilities were limited. In processing, only Landsat 7 data gave users some choices. ASTER Level-1 data are processed in Japan and shipped via tape media to the LP DAAC where they are ingested and archived.

ASTER is unique in that it allows users to request specific higher-level data products to be produced on an on-demand basis. The ODFRM was specifically built to enable users to do this. Building the separate ODFRM actually took less development time than an integrated solution, because it could be developed separately from the main EDG/ECS system, which, at the time, was itself undergoing intense development. Landsat 7 processed products were actually not processed by the LP DAAC. The LP DAAC EDG would receive processing requests for Landsat 7 data, and pass the source data and the processing request to USGS systems, who filled the order. ASTER data are processed through the standard ECS processing system, the Science Data Processing System (SDPS). MODIS data are processed at the MODIS Adaptive Processing System (MODAPS) facility at GSFC prior to ingest and archival at the LP DAAC, so it was made available from the archive. Distribution was initially provided in a very limited form. Data were distributed only via FTP ‘pulls’ (the user came and got their data) and in limited numbers of 8 mm tape. While this functionality was limited, it was judged barely sufficient for its primary user group, the NASA instrument science teams. Had they remained the designed-for user group, there may have been relatively little pressure to improve the capability and integration of retrieval, processing, and distribution. However, as system development and subsequently system operations proceeded, a change began to take place in who the intended users were.

3. A shift in emphasis With the passage of time, emphasis shifted from providing data not only to the science teams, but also to the science community in general. This larger user group was not well served with existing and planned user support systems and practices, and so a number of changes were undertaken to improve the service to the user. The DAACs, who act as the primary interface with the users, were the recipients of many of these changes from NASA’s development contractors, and also built some of their own. As a result of these changes, the LP DAACs systems evolved as is depicted in Figs. 1 and 2.

T. Kalvelage, J. Willems / Acta Astronautica 56 (2005) 681 – 687

683

user distribution demand would become so large that it would overwhelm the archive systems, making it impossible for them to ingest new data. Therefore, the Version 2 archives, also called the Data Pools, were installed. These are large disk caches (44 TB at the LP DAAC) that can store a subset of the DAAC’s data. These data are more easily available to the user than the tape-based Version 1 archive, particularly through FTP downloading. These data are relatively cheap for the LP DAAC to build, and are easily accessible to the user.

5. Retrieval systems

Fig. 1. LP DAAC-2000.

Obviously, the systems have been significantly augmented. This was done both to improve the retrieval, processing, and distribution functionality, and to increase the integration of them as well. These improvements were done primarily to improve our ability to service users. Increasing development emphasis has also been placed on user friendliness.

4. A word on archives To make the discussion below clearer we will briefly discuss the evolving archive architecture. Initially, the ECS ‘Version 1’ archive contained all the data and serviced all retrieval, processing, and distribution requests. However, it was anticipated that

On the retrieval side, we have worked to improve our interface with the science user, since they are now part of our primary user group. NASA has extensively updated the EDG with LP DAAC input to be easier and more flexible to use, while integrating new capabilities into it, which we will discuss in more detail below. The EDG will now search in both the Version 1 archive and the Data Pool, and provide links to both if the data is found. The EDG has also had the functionality of the ODFRM tool (ASTER on demand product retrieval) integrated into it. The prior ODFRM version is still being maintained, but will be phased out in the future. In addition, the LP DAAC has developed two new user interfaces, the ASTER Browse Tool and the MODIS Data Pool Client, and NASA has delivered to us a third new user interface, the Data Pool Web Interface. The ASTER Browse Tool (http://edcdaac.usgs.gov/ aster/glovis.html) is a browse navigation client. Unlike a ‘search-and-order’ tool like the EDG, the ASTER Browse Tool provides a user with a graphical display of the world that they can click on for navigation. Using mouse clicks alone, the user can zoom in to ASTER browse images of their area of interest and navigate and browse through the available collection of data. Once the user has found an image of choice, he or she can then use the image ID and go to the EDG to order it. Soon users will be able to order ASTER data directly from the Tool. The ASTER Browse Tool reuses software and infrastructure from the USGS Global Visualization project, and gets its metadata and browse from ECS via the Bulk Metadata

684

T. Kalvelage, J. Willems / Acta Astronautica 56 (2005) 681 – 687

Fig. 2. LP DAAC-2003.

T. Kalvelage, J. Willems / Acta Astronautica 56 (2005) 681 – 687

Generation Tool (BMGT). The addition of MODIS browse is also planned in the future. The MODIS Data Pool Client is also a browse-based tool, albeit different than the ASTER Browse Tool. Its interface was reused from a page used internally by the MODIS Instrument Science Team. MODIS divides the Earth’s surface into a relatively small number of tiles, and the MODIS Browse Tool displays a map of those tiles overlaid with browse. The user can examine the browse, and click on them to order the data. In this case, the data is located on the Data Pool (which will be discussed in detail below) and readily available through FTP. NASA also has provided the LP DAAC with a Data Pool user interface. This file system-based interface allows users to look directly at what is stored on the big disk cache, and download the data they wish. It is a default interface that is capable of fully exploring the Data Pool. Finally, in order to make our data even more widely available to users, we are participating in NASA’s EOS Clearinghouse (ECHO) development. The Bulk Metadata Generation Tool (BMGT) exports information to ECHO, but unlike the ASTER Browse Tool, all of our metadata and browse are exported. Once in the ECHO systems, that metadata and browse are made available for non-NASA user interfaces to access and use. In this way, new user interfaces designed for specific user groups are enabled, and allow us to service more users. NASA intends to upgrade the EDG client so that it, too, is integrated with ECHO, and not with individual DAAC systems.

6. Processing systems The Science Data Processing System (SDPS) continues to be our core system. However, we have a number of new systems, both planned and in operations, to process data to more closely meet user needs. These systems are integrated into other systems, but more importantly are integrated into our user interfaces. For a number of reasons, NASA decided early in the EOS program to have a single data format, called HDF-EOS. This format had the disadvantage of not being widely used, and few tools were built to use it. With the emphasis on the wider group of users, it be-

685

came imperative to make data available in commonly used formats. Providing format conversion software was first done with standalone tools intended to be run by the user on their desktop, such as the MODIS Reprojection Tool (1500+ registered users) and the Land Processing Operations Data Environment (LDOPE) Tools (300+ registered users). These are generally not integrated (although the MRT does help do some internal processing), but they did allow us to get functionality into users’ hands while we worked integration issues with our systems. Our first integrated tool is the HDF EOS to GeoTIFF (HEG) processing system. This functionality allows users to get their data in GeoTiff format, which is much more widely useable than HDF EOS. HEG is fully integrated into our systems; users can select GeoTIFF as an option on the EDG or other user interfaces and HEG will automatically perform the conversion processing. The External Subsetter provides additional processing functionality that focuses on subsetting data. Some of our users do not want to order large scenes of data, and instead just want data over their area of interest. This desire to see smaller amounts of data agrees with the LP DAACs desire to reduce costs and improving service by keeping network traffic as efficient as possible. The External Subsetter is integrated directly into ECS, and is selectable with EDG or any other user interface that sends us the correct order information. It takes granules and subsets them by geographical area of extent, or by band. Another processing function that we have added is subscriptions. Because this function can tend to overwhelm our systems, we do not make this user selectable. Instead, subscriptions are used to service users with particular requirements that we believe are best met that way. In general, subscriptions are used to satisfy instrument science team members’ requirements for quality assurance work, or scientists with data needs over particular geographical areas. It is important to note that our general strategy is towards greater integration, for the convenience of the users. We typically first generate functionality and use it as a standalone tool, then install it in the system without close integration with all parts of the system, and then finally fully integrate it with the user interfaces, metrics systems, and so forth. This allows us to

686

T. Kalvelage, J. Willems / Acta Astronautica 56 (2005) 681 – 687

release functionality as soon as possible, and not be held up by the broader system schedule.

7. Distribution systems Once the users retrieve and process the data, they must be distributed. In order to serve the users best, the data should be quickly provided to them, with a minimum number of delays. When mistakes are made (by us or anyone else), users should have a method they can use to express their disappointment, and get the mistake cleared up. The circa-2000 data system could only distribute 8 mm and FTP data. Even browse had to be ordered; it was not integrated into the retrieval system (this was recognized as a problem, regardless of who the users were). The FTP system worked very well. Orders were required to be available for FTP pickup within 48 h of the order, and frequently we were able to respond in less than 3 h. The hard media distribution was not as capable. The 8 mm system was basic and labor-intensive. Various methods for upgrading the system were examined, both integrated and non-integrated. For the latter, one informal proposal called for a separate system that would sit at the output of ECS and allow us to capture users’ FTP data and place it on tapes for them. An integrated solution was not an easy matter. The expected distribution volume was sufficiently large (thousands of pieces of media per week) that a rushed or clumsy system would be too labor-intensive to operate, and too difficult to troubleshoot order problems. The solution was to reuse an existing system. The USGS has a media distribution system for its own data archives, developed and maintained in the same building that the LP DAAC is located at. With NASA concurrence, the LP DAAC upgraded and adapted the USGS Product Distribution System (PDS) for integration into the ECS system. The ECS Contractor worked with the LP DAAC to integrate the upgraded system into ECS. This new combination of systems provided a relatively inexpensive way to get a mature and operational system into place to provide data on hard media to users. The system is directly integrated with ECS and EDG, and acts on the instructions given by the user.

The next distribution change was the Data Pool. As already discussed, the Data Pool is a large data cache (44 TB today) that stores data for easy access by users. It is accessible as an anonymous FTP site for both people and software, and through at least two different user interfaces, the Data Pool Web Interface (all data) and the Data Pool MODIS (a subset of MODIS data) user interface. The data selected for the Data Pool was chosen after close consultation with the instrument science teams and the LP DAACs Science Advisory Panel, and is generally a collection of recent, high quality MODIS and ASTER data. We are continuing to look at other distribution methods. With the increased interest in the user community for data available via OpenGIS Consortium (OGC) protocols, the LP DAAC has begun groundwork for using that channel. We have defined a derivative MODIS product that is in the TIFF format (we are using the OGC Web Mapping Service, which only handles TIFF). We are working towards making the data available through existing USGS OGC servers, again involving reuse of software to keep our costs down as low as possible. This and possibly more OGC data are intended to eventually be available through The National Map, a new data system developed by the USGS. The USGS intends to make a variety of mapping and science data available through The National Map, to the extent that this new system becomes a standard data retrieval system of the USGS. This would allow the USGS to provide data to a number of users who may not be aware of USGS data available today. Another new and different distribution opportunity is through the NASA ECHO program. Currently, it is focused on data retrieval, allowing multiple user interfaces built and operated by non-NASA organizations to access EOS metadata and browse. Once orders begin to come in through our link with ECHO, we can begin exploring ways to satisfy that particular demand.

8. Conclusion The previous sections have attempted to provide a comprehensive description of the LP DAAC. A number of integrated retrieval, processing, and distribution capabilities have been explained. The value of these functions to users has been described, and potential

T. Kalvelage, J. Willems / Acta Astronautica 56 (2005) 681 – 687

future improvements have been laid out for the user. Our users are clearly interested in having our retrieval, processing, and archiving systems integrated so that they can get the data they want from us in the format and delivery mechanism of their choice. Acknowledgements The authors would like to thank the United States taxpayers, for providing the funding for this work, and the LP DAAC users, for providing good feedback as to what they like, dislike, and would like to see in the future. Also, we would like to thank Vanessa Griffin and Howard Dew of NASA, and the men and women

687

of our Science Advisory Panel, for their support. Many thanks go to the men and women of the LP DAAC, present and past, for the good job done while their management learned these lessons. Finally, thanks to Bruce Quirk, for encouraging us to write this paper, John Dwyer, for his helpful suggestions and ideas, Ken Duda, for the original form of the illustrations, and Bhaskar Ramachandran, for his thorough review. Further Reading [1] EOS Reference Handbook. A Guide to NASA’s Earth Science Enterprise and the Earth Observing System, EOS Project Science Office, NASA, 1999.