Chapter 7 Storage, Maintenance and Extraction of Digital Soil Data

Chapter 7 Storage, Maintenance and Extraction of Digital Soil Data

Developments in Soil Science, volume 31 P. Lagacherie, A.B. McBratney and M. Voltz (Editors) r 2007 Elsevier B.V. All rights reserved 87 Chapter 7 ...

292KB Sizes 0 Downloads 81 Views

Developments in Soil Science, volume 31 P. Lagacherie, A.B. McBratney and M. Voltz (Editors) r 2007 Elsevier B.V. All rights reserved

87

Chapter 7

STORAGE, MAINTENANCE AND EXTRACTION OF DIGITAL SOIL DATA C. Feuerherdt and N. Robinson

Abstract With the advent of today’s technologies more and more soil information is being collected. The increased volume of data collected with these technologies poses questions regarding the storage, maintenance and retrieval of these data. This chapter will outline the database structure adopted and engineered by the Victorian Department of Primary Industries (DPI) from the Australian Soil Resource Information System (ASRIS). The database structure, its collection mechanisms, storage, maintenance and retrieval of data will be discussed, focussing on the improved distribution of data by DPI to state, national and international communities.

7.1 Introduction There are vast amounts of spatial datasets in existence today. This data is housed in a variety of data libraries ranging from simple file-based systems through to complex database structures. The concept of information domains can help bring order to otherwise disparate datasets. All data relating to natural resources fit into one of three broad information domains: 1. Climate – incorporates data relating to temperature, rainfall, radiation, etc. 2. Surface – includes land use, surface water, biodiversity, etc. 3. Earth – comprising soil, groundwater, regolith, geology, etc. Previously, each dataset has been seen as a separate entity. This has made it difficult to analyse different datasets even from the same domain because they have been treated heterogeneously. The adoption of information domains for classifying data has benefits for data analysis, as similar data is stored in a logical, consistent structure. There are many organisations across Australia that are collecting, storing and utilising soil data. At the national level, the National Heritage Trust (NHT) has undertaken the National Land and Water Resources Audit (NLWRA), which is

88

C. Feuerherdt and N. Robinson

an attempt to collate and assess a variety of biophysical assets. The data collated as part of this project has been sourced from relevant authorities in each Australian state. In Victoria there are two state departments that collect and maintain biophysical data, the Department of Sustainability and Environment (DSE) and the Department of Primary Industries (DPI). Primary Industries Research Victoria (PIRVic) is a division of DPI focussing on research and development. The NLWRA commissioned a team of experts to create a nationally consistent soil database from various states and national sources. A whole of earth domain approach was adopted during the development of the database structure allowing, potentially, any data from the earth domain to be stored in a single database, providing users to access data in ways previously unthought. The resulting database, Australian Soil Resource Information System (ASRIS), provides a structure capable of storing a range of biophysical data, but is currently focussed on soil data. Since its inception, the data structure has been refined to less than 12 tables and contains more than 16,000 fully characterised soil profiles from the agricultural regions of Australia (Johnston et al., 2003). Previously, a substantial amount of time and effort was put into maintenance of the Victorian Soil Site Database (VSSD), which comprised some 40 distinct tables. This cumbersome structure required a high level of knowledge relating to soils to input, update and extract data. Rather than go through the process of developing a new database structure, the adoption of the existing ASRIS database was more efficient and would provide future benefits. Prior to the implementation of the new data structure, the VSSD contained over 2000 fully described soil sites. All soil information collected in the field was recorded on paper sheets; a soil technician then manually entered these into a Microsoft Access database. This was a laborious process, with only several points being entered per week. This lack of currency reduced the value of the entered data and resulted in occasional use by soil scientists. This chapter outlines the adoption of a new database structure for the storage, maintenance and retrieval of soil data in Victoria. The alterations made to the ASRIS structure are discussed in the context of the issues raised by the creators of the structure and the benefits they provide to soil scientists in the DPI. 7.2 Methods and data In order to allow seamless data migration between PIRVic and the federal authorities it was decided to adopt the ASRIS data structure. In order for the database to be useful in PIRVic it would need to be accessible and useable across all aspects of the business chain (Fig. 7.1). The current ASRIS database is focused

Storage, maintenance and extraction of digital soil data

89

on obtaining resource data rather than explicit data collection. The alteration of the ASRIS data structure to increase its applicability across the breadth of the business chain was the ultimate focus of the project. Prior to the existing data from the VSSD data being migrated into the new structure, thorough crosschecking of the code tables was necessary to compare and document the missing, different and additional codes and measures. This resulted in the creation of a master code table (Fig. 7.2 – Classification codes) containing all codes from ASRIS and the VSSD. Once the comprehensive codes tables were created and additional tables incorporated, existing data were ready for migration. A separate table was

Obtaining resource data

Data management

Knowledge creation

Knowledge transfer

Implementation

Figure 7.1. Business chain.

People [1:1] LOCATION

[1:Many]

[1:1] [Many:1]

FEATURES

Projects [1:Many]

[1:Many]

Agencies

SAMPLES

[1:Many] [1:1]

[1:Many] OBSERVATIONS

Classification Codes

Observation Qualifiers

[1:Many] [1:Many] [1:Many]

OBS. METHOD

Figure 7.2. Adopted database structure.

QA Rules

90

C. Feuerherdt and N. Robinson

generated for each measure existing in the VSSD. Each of these individual tables, and their associated location tables were then merged into the new structure shown in Figure 7.2. In the report prepared by Johnston et al. (2003) as part of the ASRIS development, a list of data quality issues were outlined. PIRVic addressed some of these issues by making alterations to the ASRIS structure. In order to adopt a whole of earth domain approach, the ‘Observation qualifiers’ table (Fig. 7.2) was added, potentially allowing each measurement or observation to be attributed with the purpose for which it was collected, that is groundwater monitoring, incorporation into national soil network, etc. This table provides intelligence around each observation and can be used to query the data contained in the database from holistic approach rather than site by site, thereby solving two issues listed by Johnston et al. (2003). The alterations carried out by PIRVic were additions rather than alterations to the existing ASRIS structure. These alterations were communicated to the ASRIS team for consideration in the future development of the database structure. Some of the remaining issues are perennial across the soil domain; however, PIRVic is looking at the issues in the following ways. ’





Inconsistencies in the way soil horizons are named and described. The inconsistencies reflect the evolutions in soil and pedological description in Victoria. The current standard published in the Australian soil and land survey field handbook (McDonald et al., 1990) provides a means of consistency in recent soil description. Past soil descriptions, where appropriate, will be updated to reflect this; however, limitations in data availability may make this task difficult. Non-existent or inconsistent taxonomic descriptions. Numerous taxonomic systems have existed and evolved throughout Australian soil and field survey. Most recently the Australian Soil Classification (Isbell, 1996) has been adopted. All taxonomic descriptors have different soil observation requirements for classification; therefore, limiting collation of sites using ‘one system’ [possibly the Principal Profile Form (Northcote, 1979)] offers the best solution. Differences in methods used for measurement of specific properties. Standardisation of methods is difficult without further research to develop statistically significant relationships between tests and methodologies. This has been an issue with methods of the ASRIS database (e.g. carbon measurement). Another factor that will influence method standardisation is the possible distribution of this database to other agencies that collect soil data. It is conceivable that other industries (agronomic, building site soil testers, etc.) could input data into the database, increasing the coverage of soil information. Inclusion of methods that cover all possibilities would be seen as beneficial.

Storage, maintenance and extraction of digital soil data ’



91

Failure to distinguish between a null field and a zero result. The approach adopted by Victoria is that if there was no record in the database, there was no observation made. If an observation was made then there should be an appropriate entry into the database. Historical data, however, did not make this distinction, and some data will require interpretation during translation. Typographical or other errors in data entry. This issue is perennial with any data entry task. In order to minimise errors made during data entry rules can be attached to each reading type that specify value ranges (i.e. 0–14 for pH) or specific national codes.

The user interface is rudimentary with the addition of some time-saving features. All of these features are located on the ‘profile and measurements’ form and implemented as buttons utilising VB script and macros. The buttons include: ’ The ‘add measures’ button – by clicking this button the relevant measures are added to the list. This reduces the time taken to make entries. The measures added are dependent on which soil profile you are, that is the surface or any other profile. ’ The ‘remove nulls’ button – once measurement values have been added there may be particular measurement types with no entries (null values). Rather than store these as superfluous records in the database they can be removed by selecting this button. ’ The ‘add multiples’ button – it is possible to have more than one measurement for some measurement types, that is segregations. This button adds these measures, saving the user the tedious task of doing it manually. The inclusion of these time-saving tools makes it easier for new users to input data in an efficient manner.

7.3 Results Although the process of adopting a new data structure was tedious, multiple benefits have been realised. 7.3.1 User interface In order to streamline data entry a simple interface was developed in Microsoft Access. The creation of this entry interface allows the data entry to be easily distributed, both within DPI and to external departments and contractors. The interface allows all soil data to be collected in a homogeneous manner. The user interface comprised two main forms with four additional forms used for entering primarily descriptive data. The forms allow users to add projects, soil sites and profiles. Once these data are entered, individual measures can be added against each profile (including the surface) using drop-down lists.

92

C. Feuerherdt and N. Robinson

These lists are created from the master code table and eradicate data entry errors ensuring that only correct codes are entered. 7.3.2 Data collection protocols The current interface makes it possible to take a portable computer into the field for data collection. The benefits of collecting data directly into the database include:  Negating the need to enter paper records upon return to the office.  Reducing coding errors as only values defined in the database can be used.  Ability to run simple queries to validate data collected.  Streamlined process decreasing the time from data collection to data availability. 7.3.3 Data from external sources The development of an interface in Microsoft Access allows users to collect soil information according to national standards. A wide variety of people are collecting soil data and each has their own requirements, that is soil tests for building sites, soil fertility tests, etc. The ASRIS database has the functionality to store a wide range of measures that cover all perceivable tests. If the database were adopted as a standard collection tool for the various users, the data collected would be of benefit to all users of soil information. Data from multiple databases can be easily merged back into the master database held in Microsoft SQL by running update queries. 7.3.4 Data retrieval Currently there is no method of extracting data from the central repository without knowledge in the use of databases and SQL. It is envisaged that data query and extraction functionality will be delivered by means of an SQL client residing on the soil scientist’s desktop. This client would allow read-only access to all the data in Microsoft SQL, allowing queries to be performed and results extracted for further analysis. This concept could be expanded to include a web client allowing anyone with appropriate privileges (i.e. agronomists, etc.) the ability to query and visualise the results. 7.3.5 Information product creation The predominant purpose of soil data is to provide input into a wide variety of applications including resource allocation and modelling. Rather than provide clients with the detailed soil data stored in the database, generalised information products (Plate 7 (see Colour Plate Section) and Fig. 7.3) are used to highlight key characteristics of particular soil sites and soil units.

Storage, maintenance and extraction of digital soil data

Figure 7.3. Example page of a soil landform unit description. 93

94

C. Feuerherdt and N. Robinson

Currently, the creation of these information products is an intensive process requiring data to be collated from disparate sources including photographic repositories, soil sites database, other spatial data layers and a significant amount of expert knowledge. The adoption of the new database structure will streamline the creation of these information products. By using templates and the mail merge function in Microsoft Word, it will be possible to extract data into a standard format while making sure it abides to protocols. Manual editing will still be required to format pages appropriately, but in time, it is hoped that this manual editing can be minimised by creating custom tools. 7.3.6 Additional benefits The process undertaken in Victoria will provide input into the future development of the ASRIS database. A comprehensive report will be produced documenting all alterations and additions made. This could be used to incorporate further improvements into the ASRIS database. The streamlined interface has allowed relatively inexperienced users to input a field sheet at the rate of 3–4 per h. This has resulted in an increased volume of data being consolidated into digital format, allowing collation of vast amounts of historic and recently collected field data. Having this data in a readily accessible, digital format adds value to irreplaceable soil data collected decades earlier.

7.4 Conclusion The adoption of the ASRIS data structure has refined the collection, storage, maintenance and extraction of soil data in PIRVic. The database provides a cheap, robust and easily distributed means of collecting soil data and simplifies the process of providing updates into the national database. The simple structure of the database greatly simplifies the process of inputting and extracting data and therefore increases its use. A field-based application allowing soil scientists to directly input field data is currently being tested. This simple entry application alleviates the requirement for a data entry person to interpret handwritten notes. It also ensures that data are collected to a minimum standard and utilise common nomenclatures. The application also provides the soil scientist with the freedom to edit, alter or update data as further work is carried out. Although this data model has been tested and adopted for the collection of soil point data, the structure can easily accommodate other data. Attributes of soil landform units, land capability units or even groundwater monitoring data could be collected and stored in the same database with links to Geographical

Storage, maintenance and extraction of digital soil data

95

Information System (GIS) layers representing their spatial extent. The storage of this varied, but inter-related data in one database allows complex querying and extraction of many data layers, potentially improving our understanding of the relationships between these related datasets. References Isbell, R.C., 1996. The Australian Soil Classification. CSIRO Publishing, Melbourne. Johnston, R.M., Barry, S.J., Bleys, E., Bui, E.N., Moran, C.J., Simon, D.A.P., Carlile, P., McKenzie, N.J., Henderson, B.L., Chapman, G., Imhoff, M., Maschmedt, D., Howe, D., Grose, C., Schoknecht, N., Powell, B., Grundy, M., 2003. ASRIS: The database. Aust. J. Soil Res. 41, 1021–1036. McDonald, R.C., Isbell, R.C., Speight, J.G., Walter, J., Hopkins, M.S., 1990. Australian Soil and Land Survey Field Handbook. Inkata Press, Melbourne. Northcote, K.H., 1979. A Factual Key for the Recognition of Australian Soils. Rellim Technical Publication Pty. Ltd., Adelaide.

Plate 7. Australia: example layout of a two-page soil site description.