Ecological Informatics 5 (2010) 1–2
Contents lists available at ScienceDirect
Ecological Informatics j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / e c o l i n f
Advances in environmental information management Keywords: Metadata Ontologies Knowledge representation Data management Scientific workflows Provenance Geospatial analysis
Environmental informatics lies at the intersection of environmental science and computer science, and enables new advances in environmental research through the application of computing techniques. Advances in computing and the management of environmental information have allowed remarkable new discoveries in the environmental sciences that impact society, for example advances in our ability to predict infectious disease (c.f., Peterson et al., 2004) and climate change (c.f., Cheung et al., 2009). The field of environmental informatics is rapidly growing with vast amounts of data streaming in real or near real time from sensors complimenting datasets elaborately gathered by researchers in the field and the laboratory. Technologies for handling this deluge of data have been forthcoming, but gaps in techniques and technology still are prevalent. For example, most commercially-available software systems for collecting streaming data from environmental sensors do not interoperate with one another, and the critical details regarding data collection that are necessary for long-term data preservation are not captured by mainstream data management applications. Two informatics conferences in 2008 focused on advances in environmental informatics that will help to fill these gaps in our techniques and technology. The first of these, Environmental Information Management 2008 (EIM08) covered the practical intersection of computing and network technology and environmental research. The second, the bi-annual conference of the International Society for Ecological Informatics (ISEI6), had sessions ranging across geographical information systems, biologically inspired computing, and the use of metadata and ontologies in information systems. These conferences brought together informatics practitioners, developers and environmental scientists interested in technologies that enable data collection, description, curation, discovery, access, integration and analysis in all disciplines of environmental research. In this special feature, we highlight seven papers from these conferences that represent diverse and creative approaches to solving informatics challenges. Four of the papers focus on data management and access through the use of metadata and ontologies, and three relate to advances in the analysis and processing of data. 1. Metadata and ontologies Although structured metadata, particularly in the format of the Ecological Metadata Language (EML; Fegraus et al., 2005) have long 1574-9541/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.ecoinf.2010.01.001
been created as part of environmental information management systems (Michener, 2006) they have mainly been important for limited data discovery and as human readable description of the data. This in itself is extremely important and has created challenges especially when expanded to international multi-language applications. In traditional metadata frameworks such as Ecological Metadata Language (EML) and the Biological Data Profile (BDP), data syntax but not data semantics are presented in a machine readable way. Thus, computer applications making use of metadata are slow to appear. Leinfelder et al. introduce one such application, a data loader that builds a database and loads data by parsing and utilizing the provided metadata, thereby streamlining the process of managing and querying heterogeneous data. Calder et al. continue down this path by leveraging semantic metadata in the form of ontologies to streamline the process of developing quality assurance and error detection processes. Based on expert rules data may be flagged, augmented or corrected effectively implementing a sophisticated and domain agnostic quality control system. The creation of structured metadata is generally recognized as a major effort (Jones et al., 2001) and represents one of the disincentives for publishing data by the individual researcher. A new, highly customizable online metadata editor is introduced by Aguilar et al., who also describe earlier experiences in developing usable editors that reduce the effort required for metadata provision. In addition to natural language metadata, to enable data integration and processing as seen in Calder et al. we need ontologies that provide formal, semantic definitions of ecological concepts and defined relations among those concepts. Creating these ontologies has proven to be difficult and time consuming. Bowers et al. conclude this section by describing the Owlifier system which allows rapid creation of OWL ontologies through the use of structured spreadsheets that are translated into OWL format. 2. Data analysis, workflows, and management Real time or near real time sensor data streams are becoming increasingly prevalent in ecological research and are being used to trigger event alerts, for general monitoring, and for driving models. Therefore, large amounts of data have to be quality controlled, documented, stored, and made accessible. Due to the fact that sensor data streams are received in a fairly standardized format (date-time, location, variable, and value) some progress has been made in developing standardized protocols for handling them. Conover et al. present applications using those standard protocols developed and endorsed by the Open Geospatial Consortium (OGC). An alternate, process-oriented approach is to utilize scientific workflow systems (Taylor et al., 2006) such as Kepler (Ludäscher et al., 2006) that provide a framework for data management and analysis. Barseghian et al. extend the Kepler system to use scientific workflows to access
2
Advances in environmental information management
and analyze near real time sensor data streams together with archived sensor data. In the final paper, Porter et al. discuss the management and usefulness of image data streams to answer important research questions. They apply various feature extraction algorithms to image streams in order to streamline the process of identifying images with scientifically interesting content. As the volume of data we are collecting each year grows, these automated approaches to data processing will become increasingly important. 3. Challenges for the future Although information management has been practiced for many years within ecological research networks, many hurdles must still be overcome before interdisciplinary access to and integration of data can be considered efficient and effective. The papers from EIM08 and ISEI6 represent technical advances in our collective ability to preserve data for long-term re-use and synthesis. These technical advances in informatics play a critical role in the evolving drive towards “open science’ (Willinsky, 2005; David, 2001). However, technical advances alone cannot achieve the vision of open science (David, 2005). While some progress has been made towards changing scientific culture and establishing policies to enable open, transparent, and fair data sharing (Porter, 2010), much still remains to be accomplished (Arzberger et al., 2004; Costello, 2009; Anonymous, 2009) as incentives for making data available are missing or theoretical at best, and disincentives for the individual researcher are plentiful. There is currently a distinct disconnect in environmental science between the technical advances in information processing and adoption of those techniques by the broader science community. As the field of informatics advances, it will become increasingly beneficial for environmental researchers to utilize these new tools and techniques to preserve data and utilize it to address the pressing environmental challenges that confront us. References Anonymous, 2009. Data's shameful neglect. Nature 461, 145. Arzberger, P., Schroeder, P., Beaulieu, A., Bowker, G., Casey, K., Laaksonen, L., Moorman, D., Uhlir, P., Wouters, P., 2004. An international framework to promote access to data. Science 303, 1777–1778. Cheung, W.L., Lam, V.W.Y., Sarmiento, J.L., Kearney, K., Watson, R., Pauly, D., 2009. Projecting global marine biodiversity impacts under climate change scenarios. Fish and Fisheries 10, 235–251. Costello, M.J., 2009. Motivating online publication of data. BioScience 59, 418–427.
David, P.A., 2005. Towards a Cyberinfrastructure for Enhanced Scientific Collaboration: Providing its ‘Soft’ Foundations may be the Hardest Part. Law and Economics 0502002, EconWPA. RePEc:wpa:wuwple:0502002. David, P.A., 2001. From Keeping ‘Nature's Secrets' to the Institutionalization of ‘Open Science’. Fegraus, E., Andelman, S.J., Jones, M.B., Schildhauer, M., 2005. Maximizing the value of ecological data with structured metadata: an introduction to ecological metadata language (EML) and principles for metadata creation. Bulletin of the Ecological Society of America 86 (3), 158–168. Jones, M.B., Berkley, C., Bojilova, J., Schildhauer, M., 2001. Managing scientific metadata. IEEE Internet Computing 5 (5), 59–68. Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger-Frank, E., Jones, M., Lee, E., Tao, J., Zhao, Y., 2006. Scientific workflow management and the Kepler system. Special issue: workflow in grid systems. Concurrency and Computation: Practice and Experience 18 (10), 1039–1065. Michener, W.H., 2006. Meta-information concepts for ecological data management. Ecological Informatics 1, 3–7. Peterson, A.T., Bauer, J.T., Mills, J.N., 2004. Ecologic and Geographic Distribution of Filovirus Disease. Emerg Infect Dis [serial online] Available from: URL: http://www. cdc.gov/ncidod/EID/vol10no1/03-0125.htm. Porter, J.H., 2010. A brief history of data sharing in the U.S. long term ecological research network. Bulletin of the Ecological Society of America 91, 14–20. doi:10.1890/ 0012-9623-91.1.14. Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (Eds.), 2006. Workflows for e-Science: Scientific Workflows for Grids. Springer, London. ISBN: 978-1-84628-519-6, 530 pp. Willinsky, J., 2005. The unacknowledged convergence of open source, open access, and open science. First Monday 10 (8-1) August 2005. Available at: http://firstmonday. org/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/1265/1185.
Matthew B. Jones Editor National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara, 735 State St., Santa Barbara, CA 93101, United States E-mail address:
[email protected]. Corinna Gries Editor Center for Limnology, University of Wisconsin, 680 North Park Str., Madison, WI 53706, United States Corresponding author. E-mail address:
[email protected].