Geoderma xxx (xxxx) xxx–xxx
Contents lists available at ScienceDirect
Geoderma journal homepage: www.elsevier.com/locate/geoderma
Past, present & future of information technology in pedometrics David G. Rossiter* ISRIC World Soil Information, PO Box 353, Wageningen 6700 AJ, The Netherlands Section of Soil & Crop Sciences, Cornell University, 242 Emerson Hall, Ithaca, NY 14850, USA School of Geography Science, Nanjing Normal University, 1 Wenyuan Road, Nanjing, Jiangsu 210046, China
A R T I C LE I N FO
A B S T R A C T
Handling Editor: Dr. A.B. McBratney
Although pedometric approaches were taken as early as 1925, the post-WWII development of information technology radically transformed the possibilities for applying mathematical methods to soil science. The most significant development is the digital computer and associated developments in computer science. These allowed statistical inference on large pedometric datasets, e.g., numerical taxonomy of soils from the early 1960s and geostatistics from the mid-1970s, as well as simulation models of soil processes. By the time of the first Pedometrics conference in 1992 sufficient computing power was available for stochastic simulation and complex geostatistical procedures such as disjunctive kriging. In the intervening 25 years computing power has grown to almost magical proportions, allowing individual scientists to carry out complex procedures. A second development is the growth of networking. This facilitates collaboration among pedometricians, rapid communication with journals, collaborative programming and publication, and easy access to resources. A third development is the availability of on-line storage of large datasets, especially of open data, including GIS coverages and remotely-sensed images. This allows pedometricians working on geographic problems to integrate sources from multiple disciplines, most notably in digital soil mapping using a wide variety of covariates related to soil genesis. It also promotes the an open-source movement of collaborative development of computer programs useful for pedometrics. A fourth development is the wide range of digital sensors which provide data for pedometrics; sensors include spectroscopy, electromagnetic induction, and γ-ray detectors., connected to each other and to central data stores. A fifth development is wireless technology, including mobile computing and telephony, again greatly facilitating rapid and extensive data collection – in pedometrics, the more dense the data, the greater the analytical possibilities and the lower the uncertainty. A final development is a global navigation satellite system (e.g., GPS), making accurate georeference of field data a routine part of data collection, and thereby assuring the highest possible accuracy in maps made by predictive pedometric methods. As computer power increases, models can be correspondingly detailed. As sensor networks and remote sensing provide ever more abundant and spatiotemporally fine-resolution measurements, the challenge is to manage this information and maintain the link with pedologic and soil-landscape knowledge, within the context of societal needs for the results of pedometric analysis.
Keywords: Information technology Pedometrics Simulation models Geostatistics Numerical taxonomy Soil-landscape evolution
1. Introduction Information technology (IT) is defined by the Oxford English Dictionary as “[t]he branch of technology concerned with the dissemination, processing, and storage of information, esp. by means of computers; abbreviated IT.” This definition implies that algorithms and computational methods as such are not part of IT; rather, the technology to implement them is included in the concept. Acquisition of information (e.g., by digital sensor systems) is not included, only its processing, storage, dissemination once acquired; however since data must be acquired before it can be used in pedometrics, we include it in
*
the present discussion. Information technology includes (1) data processing (“digital computers”); (2) digital data storage; (3) digital information sources; (4) computer operating systems; (5) computer programs; information standards; (6) communication & networks; and (7) digital sensors (satellite, air, field, lab.) and their associated digital infrastructure to convert raw sensor data into information products. Pedometrics may be defined as “the application of mathematical and statistical methods in the study of the distribution, the characterization and the genesis of soils” (Heuvelink, 2003). Thus the topic of this paper is the development of IT as it has affected pedometrics: how pedometricians work with IT, what aspects of
Section of Soil & Crop Sciences, Cornell University, 242 Emerson Hall, Ithaca, NY 14850, USA. E-mail addresses:
[email protected],
[email protected].
https://doi.org/10.1016/j.geoderma.2018.03.009 Received 3 November 2017; Received in revised form 2 March 2018; Accepted 12 March 2018 0016-7061/ © 2018 Elsevier B.V. All rights reserved.
Please cite this article as: Rossiter, D., Geoderma (2018), https://doi.org/10.1016/j.geoderma.2018.03.009
Geoderma xxx (xxxx) xxx–xxx
D.G. Rossiter
for linear regression they grouped observations, rather than use all 500 to fit the model. Fifteen years later Haines and Keen (1925a) carried out a series of dynometer experiments and produced a contour map of the drawbar pull required over a field, again by hand; this may be the first two-dimensional continuous soil property map. In a subsequent article (Haines and Keen, 1925b) they expanded this to a 3D solid model. In 1937 Youden and Mehlich (1937) computed what were effectively variograms of soil properties at spacings of a nested sampling design, along with a nested analysis of variance, again all by hand. The first moves beyond hand computation were made possible by the development of the digital computer. An early pedometric application was numerical taxonomy applied to soils (Arkley, 1971; Bidwell and Hole, 1964; Hole and Hironaka, 1960; Rayner, 1966). In 1960 Hole and Hironaka (1960) were apparently computing taxonomic distances by hand, probably using a mechanical calculator. They constructed 3D models of similarity by cutting rods in lengths proportional to the dissimilarity indices and connecting them into a ball-and-stick figure. In 1964 Bidwell and Hole (1964) mention “electronic computers”, without specifying which, implying that matrices and dendrograms were computed on a mainframe computer. That paper includes a dendrogram, a graphical similarity matrix, and a 3D ordination plot; however these appear to have been drawn by hand based on the results of computations, not drawn by a computer graphics program. By 1971 Arkley (1971) mentions “large computer and multivariate statistics”, but does not specify what is “large” nor which computer was used. Also during the 1960s, models of individual soil processes, such as heat and water transport, were developed; “consist[ing] mostly of analytical solutions of partial differential equations for well-defined soils and porous media, numerical solutions of single partial differential equations, or conceptual models that were solved with analog or digital computers” (Vereecken et al., 2016). During the early development of soil information systems (1975–1983) computers came into wide use. Typical is the information system of STIBOKA (Dutch Soil Survey Institute) in 1977 (Bie and Schelling, 1978). This consisted of a CDC Cyber 72 mainframe computer, the NOS/BE operating system, the ANSI FORTRAN IV programming language, a relational database built by the National Environmental Research Council (UK), a Data General NOVA 1200 minicomputer (32K words) for graphics, a digitizer and plotter; 19” CRT display, a 14M word disk, a 9 track 800 b.p.i. magnetic tape, and teletype input. In 1984 this homebrew database was replaced with a commercial product (ORACLE) (Buurman and Sevink, 1995). This illustrates the increasing maturity of commercial IT offerings over this decade. This was also the period where geostatistical approaches, primarily various forms of kriging, were first applied to pedometrics. A typical paper is by Yates et al. (1986) in which a FORTRAN IV program to perform disjunctive kriging is described. Their example dataset consisted of only 92 points. Although the authors did not discuss limitations of computing resources, the implication is that only small datasets could be handled by the IT available to them. Towards the end of this period the so-called “personal” computer, on the desktop of the individual pedometrician, became available (e.g., IBM PC 1982, Apple Macintosh 1984). Computer programs specifically for these were soon developed, for example the IDRISI GIS grid-based GIS, developed from 1987 (Eastman and Fulk, 1993). For more computationally-demanding applications, this era also saw the advent of desktop Unix workstations (e.g., Sun-1 1982), although these do not appear to have been used by pedometricians until about ten years later. Towards the end of the 1980s, the development of more powerful computers and improved numerical methods (Press, 1986) integrated into subroutine packages (e.g., Dongarra, 1979) allowed the development of soil models that integrated different types of processes (physical, chemical, biological) and their interaction; these were applied at various scales from pore to global at different levels of process generalization (Vereecken et al., 2016).
IT are most important to the work of pedometricians, how IT has changed the way pedometricians work and the topics we take up, and how it may affect our work in the near future. It does not consider the intellectual development of pedometrics, only how new possibilities in IT have facilitated advances in pedometrics. Pedometricians are a sub-class of academic and research soil scientists, which are themselves a sub-class of academic and research earth or biological scientists, which are themselves a sub-class of research scientists in general. The use of IT thus can be explained in a hierarchy. All research scientists, including academic and research soil scientists, use IT for preparing and editing manuscripts and proposals (using text, mathematics and graphics), presenting scientific work to peers, funders and the public (perhaps also using animations), searching for and reading scientific results, designing research, and communicating and collaborating with peers. In addition, earth scientists use IT to navigate and collect data in the field using various sensor systems, to search for and use digital data on the Earth system, to analyze samples with computer-controlled or -assisted lab. instrumentation, and to organize data and results in databases and networked repositories. In addition, pedometricians use IT to write and execute pedometric algorithms (1) to summarize and manipulate sensor data; (2) to model soil properties and functions, spatio-temporal variability, and landscape evolution; (3) to predict and simulate using these models; (4) to design sampling schemes; and (5) to produce digital soil maps or other soil geographic databases. 2. Pedometrics timeline The development of pedometrics up to the first Pedometrics conference in 1992 was reviewed by Webster (1994). From 1975 to 1983 an IUSS working group on soil information systems, reviewed by Burrough (1991), dealt with IT as it related to acquisition, storage and retrieval of map and point data. I reviewed the above-cited papers and all the papers in the special journal issues resulting from the Pedometrics conferences from 1992 (Wageningen) (De Gruijter et al., 1994), 1997 (Madison) (De Gruijter, 1999), 1999 (Sydney) (Odeh and McBratney, 2001), 2001 (Ghent) (Van Meirvenne and Goovaerts, 2003), 2003 (Reading) (Oliver and Lark, 2005), 2005 (Naples FL), 2007 (Tübingen), 2009 (Beijing) (Zhu et al., 2012), 2011 (Třešt, CZ) (Boru̇vka et al., 2014), 2013 (Nairobi) (Anonymous, 2016) and 2015 (Córdoba) (Vanwalleghem et al., 2018), as well as the Fuzzy Sets in Soil Science conference 1995 (de Gruijter et al., 1997). Very few pedometrics-related papers in Geoderma, the European Journal of Soil Science (EJSS), the Soil Science Society of America Journal (SSSAJ), Mathematical {Geology, Geosciences} and similar journals describe the IT used (computer hardware, networks). Some mention the computer programs used, but all describe the algorithms, e.g., simulated annealing, sensitivity analysis, random forests, and generalized linear models, process models. From the nature of the algorithms and datasets we can infer something about the IT needed for the reported studies. If field or remote sensors were used for data acquisition, they are described. 3. History of IT use in pedometrics A set of tables (see Supplementary material) showing selected milestones in various aspects of IT relevant for pedometrics is provided for context to the following discussion. Going back to the earliest days of quantitative soil science, in 1910 Mercer and Hall (1911) studied the variability of wheat yield in an apparently uniform field, one of a series of uniformity trials popular at the time (Cochran, 1937). Although they did not measure the soil, they inferred that the substantial local variability in crop yield and grain/ straw ratio could in part be attributed to local differences in soil nutrients. They computed numerical summaries, linear regression, and probable errors, all computed by hand. To simplify hand computations 2
Geoderma xxx (xxxx) xxx–xxx
D.G. Rossiter
By 1992 Burrough et al. (1994) could state “Cheap, powerful personal computers have made geostatistics available to many people and have stimulated the teaching of the methods in universities”, and list 11 programs, including some that have continued in use (e.g., PC-RASTER, SURFER, Genstat) and others that were once popular but now have been superseded (e.g., GEOEAS). They did not further specify their concept of “powerful”. A typical paper of this period is by Dobermann (1994), who performed exploratory data analysis (summaries, scatterplots, lowess smoothing), factor analysis (including Varimax rotation), stepwise and hierarchical multiple linear regression (CSS/STATISTICA), variogram analysis (SPATANAL), fitting by weighted least squares, and block kriging (GEO-EAS, PC-RASTER). Among the methods which were computationally feasible by 1992 were multivariate geostatistics, crossvalidation, jackknifing, principal components, factor analysis, varimax rotation, spectral analysis by Fast Fourier Transform, sets of partial differential equations, and simulation by process models. An example of the latter is simulation of solute leaching from a three-dimensional heterogeneous field: three-dimensional transport problems with nonlinear adsorption (Bosma et al., 1994). By 1995 large numbers of Monte Carlo simulations became feasible on desktop computers. An example application is by Dobermann and Oberthür (1997), who performed 10 000 simulations of joint fuzzy membership functions, using the ADAM program developed by Wesseling & Heuvelink at Utrecht. Advances in GIS allowed terrain parameters to be computed over moderately-sized areas (several kha), as covariates for digital soil mapping (DSM) (e.g., Lagacherie et al., 1997). 1997 was a breakthrough year for electronic communication. Journals publishing pedometrics papers started to print authors' e-mail addresses, and institutions reserved e-mail address blocks for their staff. Desktop computer power continued to increase. For example, van Groenigen and Stein (1998) were able to use spatial simulated annealing for soil sampling optimization of 37 points at 20 000 possible locations over an irregular area; this computation required 5 h on a Pentium PC @ 120 MHz. Digital soil mappers began to use GPS for geolocation, although because of selective availability (not turned off until May 2000) a base station was required for position correction (McKenzie and Ryan, 1999). An especially important development for process models is parallel computing; this began in the 1970s and reached routine operational status in the early 1980s. This required developments of the computers themselves, their interconnection, and especially compilers and communications protocols to break complex problems into multiple subproblems which can be solved in parallel and then combined. This is important when fine discretizations of space and/or time are needed, due to non-linearity. An early example in pedometrics is Vereecken et al. (1996); however the routine use of parallel computing is more recent, from the mid-2010s, made possible by cheap remote cloud computing and standardized methods for parallelization.
A good representation of the use of IT in recent years is provided by the eight papers selected for the “best paper in Pedometrics 2016” competition sponsored by the Pedometrics Commission of the IUSS. None of these would not be possible without modern IT. An example is the global spectral library of Viscarra Rossel et al. (2016). To build this library required machine learning, the computation of fuzzy c-means, correspondence analysis, and, wavelets on 12 509 spectra × 216 wavelengths [NIPALS], vis-NIR lab. and field instruments. Computation on large objects is now routine; in this paper the matrix of the spectra is 2.57 GB. Storage and transmission of large datasets is also routine; this paper provides a 9 Mb dataset for download; This group of papers also includes one (Lark, 2016) on the optimization of wireless sensor networks, including routing of sensors mounted on unmanned aerial vehicles (UAV). For the application of pedometric techniques to DSM, Minasny and McBratney (2016, Section 5.1) developed a formula implying that the number of grid cells that can be mapped by DSM increases at an approximate rate of ≈ 2 a−1. As of 2016 the largest project was POLARIS (see below), with ≈ 9 ⋅ 109 grid cells; however this required a supercomputer. Projects with more conventional computing resources were able to map one order of magnitude fewer cells, ≈109. Despite the almost magical advances in IT, there are still some limitations facing pedometricians using computationally-intensive models, especially simulations and mappings over large areas. For example, Poggio et al. (2016) performed Bayesian spatial modelling of soil properties and their uncertainty using the INLA (Integrated Nested Laplace Approximation) and SPDE (Stochastic Partial Differential Equation) methods. However, to apply the method over 12 100 km2 in Scotland, using 3106 point observation, the authors could not model the entire area as one unit, and had to subset the area with a “chessboard” approach. The paper on sensor optimization mentioned in the previous paragraph (Lark, 2016) used a spatial simulated annealing algorithm (AMOSA). The first run of the algorithm to find 20 points required 2000 iterations, 40 stored solutions, and 10 h of clock time on a 3.40 GHz processor with 8 GB RAM. Román Dobarco et al. (2017) applied two methods (Granger–Ramanathan and Bates–Granger) of model averaging to combine two regional maps and a European map of topsoil texture in the Central region (F). In describing their methods they state “[W]e did not produce maps of the upper and lower prediction limits due to the high computational cost”. Malone et al. (2014) rejected the routine application of the Bates–Granger variance weighted averaging method for ensemble predictions of soil pH at 30 × 30 m grid over a 68 000 km2 area (approximately 75M grid cells), in favour of the Granger–Ramanathan averaging method, despite its somewhat poorer performance, for computational reasons. Hengl et al. (2017b) produced 15 nutrient maps at four depths of sub-Saharan Africa at 250 m grid resolution over 23.6M km2, i.e., about 23G predications, based on about 50k measurements per nutrient. In late 2016 (paper submitted 19January-2017) this required approximately 40 h on a dedicated server with 256 GiB RAM and 48 cores. The authors state that “[w]ith some minor additional investments in computing infrastructure, spatial predictions could be updated in future within 24 hrs”. IT limitations are also evident in models of soil processes (Vereecken et al., 2016) and soil-landscape evolution (Opolot et al., 2015). Models such as SoilGen2 (Finke, 2012), which is a pedon scale 1D soil formation model based on physical principles of pedogenesis, require many simulation runs to assess the sensitivity to the many calibration parameters, as well as inputs. For example, a recent paper (Keyvanshokouhi et al., 2016) submitted to the journal in May 2016 reported that a typical SoilGen2 run simulating a 30-compartment soil profile over a 15 000-year period required 180 h CPU time on a dedicated cluster of 7 PCs, each with 4 CPU @ 2.4 GHz, 12 GB RAM. This configuration allows 28 parallel runs (Peter Finke, personal communication). Similarly, a recent paper (Temme and Vanwalleghem, 2016), received at the journal in February 2015, introduced the LORICA soil-
4. Current status Processing power has reached previously-unimaginable levels, at the same time with low cost and thus easy access. For example, as of mid-2017 a Samsung Galaxy 8 smartphone weighs only 200 g, has 4 GB RAM; 128 GB flash memory; a 256 GB microSD; a Kryo 280, an 8-core CPU @2.45 GHz; and an Adreno 540 graphics processor @ 710 MHz, all for about $800. By comparison, the 1960s Apollo missions used a mainframe IBM System/360 Model 75 computer with 1 Mb RAM (4000 × less) @ 8 MHz (300 × slower) for $3.5M (4365 × more expensive, unadjusted for inflation). The Apollo Guidance Computer, which was “personal” for the astronauts, had 64 kB RAM, and a CPU @ 0.043 MHz. A modern workstation is the Dell Precision Tower 5810, with 8 GB RAM; 500 GB hard disk; Intel Xeon E5 v3 CPU with 4 cores @ 3.6 GHz, costing about $2k. In 1993 a comparable offering from Dell was an i486 CPU @ 66 MHz (55 × slower), 8 MB RAM (1024 × less); 320 MB hard disk (1560 × smaller) for $4.4k (2.2 × more expensive). 3
Geoderma xxx (xxxx) xxx–xxx
D.G. Rossiter
that the volume, variety, velocity, veracity (i.e., data quality, uncertainty), and especially the connectedness of sensor data is driving a new paradigm for science exploration and discovery. This is part of the so-called “fourth paradigm” (Hey et al., 2009) for scientific discovery based on data-intensive computing, which was proposed by computer scientist Jim Gray in 2007: using computers to understand the massive amount of digital data now available or to be captured, curated and analyzed. Other important developments in IT are not specific to pedometrics, statistics or soil modelling, but apply to all scientists; yet these have had a large effect on pedometricians' working methods. Among these are the support for finding previous and current research from large on-line reference databases (e.g., Web of Science, Google Scholar, Elsevier Science Direct, JSTOR), with hyperlinks, forward and backwards references, keyword search and full texts. Papers from the pre-digital age are rapidly being converted to digital format and included in these databases. The contents of digital versions of papers are searchable, allowing easy access to key words or phrases. This is a great time-saver, and makes it much easier to find all relevant references. Papers receive a Digital Object Identifier (DOI)2, which allows rapid and unambiguous linking to the abstract and full text. These platforms also provide citation alerts, table of contents alerts, and customized alerts (e.g., author, keyword, topic), ensuring that new relevant work is not overlooked. There is now sufficient storage on personal computers for a substantial collection of full-texts, and essentially unlimited cloud storage. The process of publishing papers has also become much easier due to developments in IT. Communication between authors, editors and reviewers is rapid and transparent. Papers in the traditional review process can still take more than a year, but this is largely because of problems finding reviewers and the workload of editors. There is now a movement to direct publication with open on-line reviews, e.g., the SOIL interactive journal of the European Geographic Union3. On-line data storage allows the easy provision of supplementary materials, datasets, and computer code; the page limitations of paper journals are no longer relevant. Another sort of publishing is now possible due to cloud storage: data repositories, stable and citable (with a DOI), and publicly-accessible. Increasingly data publishing is a requirement for publication in a journal or as part of a project report. Pedometricians use, adapt, and write computer programs. These are increasingly developed according to the open-source paradigm4: nondiscriminatory free distribution of source code, allowing derived products. Code repositories such as GitHub5 facilitate collaborative development. An example of a collaborative programming project is the Algorithms for Quantitative Pedology (AQP) R package based at the California Soil Resource Lab (Beaudette et al., 2013). Linked to this is the use of open standards for data formats and exchange, for example the ISO standard for digital exchange of soil-related data (International Organization for Standardization (ISO), 2013) and the GODAN (Global Open Data for Agriculture & Nutrition) Soil Data Working Group6. An interesting development which has not yet been perfected, but still is useful for pedometricians, is on-line machine translation. Although most prominent journals contain only papers in English, still there are many reports and local studies in other languages, especially for work commissioned by local authorities, including soil surveys. Translation from Germanic and Romance languages to English of structured scientific texts is generally good enough to understand the meaning; from other languages the results are less satisfactory, but the transition from attempts to understand grammar to translation based on bilingual corpus (full text) is rapidly improving these results.
landscape evolution model, which simulates geomorphic processes, physical and chemical weathering, clay translocation, bioturbation and the C cycle. Each model run on an hypothetical landscape of 101 by 51 cells of 20 m cell size, each with 22 depth slices, for 5000 years, required about 30 min on a “standard desktop computer”. Thus a full sensitivity analysis for the 28 input parameters was not feasible, so that a global screening method had to be used to limit runtimes. Pedometricians have rarely entered the world of supercomputing, where the computing power is orders of magnitude more than can be accomplished on the desktop or with small clusters of servers. An example is the POLARIS map of the contiguous USA at 30 m spatial resolution (≈ 9 ⋅ 109 grid cells) (Chaney et al., 2016). This used a moving-window (30 × 30 km with 60 km buffer) DSMART algorithm (Odgers et al., 2014). At each grid cell the 50 most probable soil series and associated uncertainties were computed. This required ≈450 000 core-hours on the Blue Waters supercomputer at the (USA) National Center for Supercomputing Applications (NCSA), but this corresponded to only 5 h clock time. This is a trivial problem compared to the usual applications (e.g., particle physics, fluid dynamics, climate dynamics, astrophysics) of this supercomputer. The authors state that this is “negligible computer time at current HPC facilities that can handle 107 (≈1100 years) core-hour tasks”. Some important developments in IT are not specific to pedometrics, but to the entire earth science community. The most obvious is the proliferation of remote sensing systems (satellite and airborne) as well as near sensors (field scale) and laboratory instruments. Field data acquisition is the source of data for the earth sciences. Here too there has been explosive growth in the variety, quantity and quality of sensors. Perhaps the most significant is the development of global navigation satellite system (GNSS), for example the Global Positioning System (GPS), and associated receivers and computer programs. This makes accurate georeference (commercial grade: 3 m; differential grade: 1 m horizontal1 of field data a routine part of data collection anywhere in the world. This assures the high accuracy in maps made by predictive pedometric methods. Another development is the data logger, which digitally store information received directly from digital sensors (such as a GPS, automatic weather stations) and then pass them on to a computer for processing, generally unattended over a long time period, but also with an operator. These generally place a time-stamp on acquired data, allowing time-series analysis; if combined with location recording spatio-temporal analysis is possible; both of these with a large quantity of fine-resolution data. Complete systems with multiple sensors are now possible. An example is the study of Viscarra Rossel et al. (2017), which combines γ-ray attenuation densitometer (for bulk density), digital cameras, and a visible near-infrared (vis-NIR) spectrometer (for iron oxides and clay mineralogy). Multiple sensor systems are now routine in so-called “precision agriculture” (Viscarra Rossel and Bouma, 2016). Among the sensor developments in the soil laboratory, the lab. spectrometer, primarily for the visible and near infrared (vis–NIR, 400–2500 nm wavelengths) range, has been applied successfully as a replacement for wet chemistry, leading to increased standardization of measurements (Viscarra Rossel et al., 2016). Laser diffraction granulometry has been used to replace wet chemistry determination of particle-size fractions (Makó et al., 2017) and to assess aggregate stability (Rawlins et al., 2015). Imaging such as laser scanning (Hirmas et al., 2016) and X-ray computed tomography (Helliwell et al., 2013) have been applied to quantify soil structure. These are just some of the ITbased tools used in the emerging field of Digital Soil Morphometrics (Hartemink and Minasny, 2016). The important point is that acquired data is increasingly objective, abundant, and in digital form to allow analysis. With respect to sensors, Roudier et al. (2015) go further, asserting
2
https://doi.org/. https://www.soil-journal.net/. https://opensource.org/. 5 https://github.com/. 6 http://www.godan.info/working-groups/soil-data. 3 4
1
https://water.usgs.gov/osw/gps/.
4
Geoderma xxx (xxxx) xxx–xxx
D.G. Rossiter
theory. There is a similar caution for numerical taxonomy. Limited computing power forced the early numerical taxonomists to carefully choose characteristics, according to their concepts of pedogenesis and the basis of classification systems. With increasing computing power there is a temptation to use all available characteristics; this can lead to discovery of unforeseen patterns but also to meaningless relations. The “digital divide” refers to IT limitations experienced by pedometricians working in some countries. Although IT infrastructure reaches almost the whole world, there are places where bandwidth is severely limited, making it very difficult to download journal articles, manuals, or datasets. There is commercial interest in improving access, especially by means of wireless networks, but government regulation can stand in the way. A related issue is restriction on access to all or parts of the internet by governments. Some research resources can not be used; a well-known example is Google Scholar in the People's Republic of China. Researchers and governments play “cat and mouse game” with virtual private networks (VPNs) and mirrored websites. Another sort of digital divide is caused by information which must be purchased. Researchers in institutions with resource-rich libraries can access a wider range of information than those in other institutions. Prominent examples are reference databases, e.g., Web of Science, and full-texts of non-open access papers at all commercial publishers. This evident unfairness means that some researchers can not reach their full potential through no fault of their own. This is one of the motivations behind the “open access” movement (Suber, 2012): online research that are freely-accessible and useable in further research. However, most open access publishers require payment by the authors. For example, as of March 2018 the article processing charge for open access papers published by the EJSS was $3700.7 These costs may be included in project funding, but that is less likely for resource-poor researchers on the wrong side of the divide. Some otherwise restricted material is provided by repositories, some operating within the copyright agreements between authors and publishers, and some not.
Another striking development is in communication and networking among collaborating pedometricians. From 1971 communication was possible by e-mail (no attachments, small messages) and file transfer (small files) over telephone links. From about 1982 internet protocols were developed and major universities and research centres in the USA were networked. Networking expanded rapidly in the mid-1980s, especially with the advent of commercial internet service providers (ISP) in the late 1980s. These developments facilitated collaboration also across time zones and rapid turnaround of research. For example, instead of mailing drafts of papers or results from analysis (taking 1 day to several weeks), e-mailing or file transfer is almost instantaneous. Voice-over-internet (VOIP) telephony, conference calls and video conferencing were developed in the 1990s and became widely available in the mid-2000s; these also facilitate collaboration. A related development is wireless technology, both for telephones, internet, and data. This allows flexible working conditions, with a affordable home networks, easy access in public libraries, public transport, coffee shops, etc. The pedometrician is not bound to the office and can work almost anywhere at any time. Whether this is a good social development may be questioned, but it certainly facilitates work. 5. Cautions With all the advances in IT, and their enthusiastic adoption by the pedometrics community, there are some cautions that must be considered: (1) the temptation of technology; (2) the digital divide; (3) restricted information due to internet censorship; (4) restricted information due to commercialization of information; and (5) supplydriven research, divorced from societal needs. The temptation of technology can also be phrased as the rhetorical question: does the methodological ‘tail’ wag the substantive ‘dog’? Already in 1977 Miller (1977), discussing the rapid adoption of path analysis in the social sciences, wrote: “Fascination with powerful (but perhaps deadly) techniques (and an unmistakable pressure to use them) [is a manifestation of] Kaplan's law of the instrument: Give a small [child] a hammer, and he will find that everything he encounters needs pounding… Canned computer programs [and powerful IT infrastructure] have added highly combustible fuel to this fire of methodological abuse.” One pedometric application area where this can be a problem is DSM. Pure machine learning (“data mining”) approaches (e.g., Hengl et al., 2017a) use all available covariates and no model, thus the analyst does not have to make choices about covariates or structure. This is now possible because of computer power, data storage and easy access to covariate coverages, and can certainly produce interesting and unbiased maps. However this approach ignores pedological knowledge in model building, and the results are difficult to interpret in order to develop new or revised theories of soil-landscape relations. Another trend in DSM is to apply numerous mapping methods, compare the results mechanically, and automatically select the “best” according to a purely statistical evaluation (e.g., Sun et al., 2014; Wang et al., 2013). These papers rarely discuss which model structures correspond to reasonable hypotheses about soil-landscape relations, nor why, in a given context, one approach is chosen; they often do not even justify the choice of statistical evaluation. Advances in IT, in particular sensor technology, can provide ample data with which to numerically evaluate the accuracy of these methods. Discrepancies with field evaluation can point to problems with either theory or models. IT is not here the problem as such, but it enables pedology-free inference. IT can also help pedological insight in DSM. For example, programs for Structural Equation Modelling can combine pedologic knowledge with computation to reveal insights about soil processes from data (Angelini et al., 2017). As another example, the predictions made by data mining can be compared to maps made with subjective expert opinion, and discrepancies examined to gain insight into either improved machine learning approaches or revision of soil geographic
6. Future prospects In the near future pedometricians can expect almost no IT restrictions on what can be computed for pedometric problems; almost no IT restrictions on the size of databases (e.g., covariates for DSM); real-time distributed sensor networks with automatic pre-processing, also called the “internet of things”; and effortless (from the IT point of view!) communication and collaboration. However, with increasing computer power, soil-landscape models used to understand and map soil geography can become ever more complicated (more processes, finer timesteps, wider geographic areas), so these will always be at the limit of computing resources. IT has reached the stage where the possibilities for data acquisition (via remote, field, and lab. sensors) provide virtually unlimited digital data, at ever finer spatial and temporal resolutions. The challenge now is to manage the flood of information and extract meaning. This is related to the concept of the “fourth paradigm” (Hey et al., 2009) for scientific discovery based on data-intensive computing, following the first (empirical: describing natural phenomena), second (theoretical: using models and generalizations), and third (computational: simulating complex phenomena) paradigms, according to Gray (2009). This fourth paradigm is data exploration: unifying experiment, theory, and simulation. In pedometric terms, these are observations, empirical and theorybased models, and simulations of soil processes. A larger challenge is to ensure that with the increasing supply of digital information, and the increasing power of computing resources to drive ever more sophisticated algorithms, pedometrics does not become 7 https://authorservices.wiley.com/author-resources/Journal-Authors/licensing-openaccess/open-access/article-publication-charges.html.
5
Geoderma xxx (xxxx) xxx–xxx
D.G. Rossiter
supply-driven. Progress in soil science in general, and pedometrics in particular, has been most relevant when a clear societal problem has driven new developments. For example, traditional soil survey was motivated by the need for land suitability evaluation. Similarly, the development of geostatistics was motivated by mineral resource assessment, and in soil survey by soil pollution assessment. Pedometricians should assess the relevance of their work to current societal needs, such as the UN's Sustainable Development Goals (SDG) (Keesstra et al., 2016) or sustainable intensification of agriculture (Schulte et al., 2014). Certainly the possibilities offered by today's powerful IT can be applied to these pressing problems.
1017/S0021859600006821. Haines, W.B., Keen, B.A., 1925b. Studies in soil cultivation. III. Measurements on the Rothamsted classical plots by means of dynamometer and plough. J. Agric. Sci. 15, 395–406. http://dx.doi.org/10.1017/S0021859600006833. Hartemink, A.E., Minasny, B. (Eds.), 2016. Digital Soil Morphometrics. Progress in Soil Science. Springer International. Helliwell, J.R., Sturrock, C.J., Grayling, K.M., Tracy, S.R., Flavel, R.J., Young, I.M., Whalley, W.R., Mooney, S.J., 2013. Applications of X-ray computed tomography for examining biophysical interactions and structural development in soil systems: a review. Eur. J. Soil Sci. 64, 279–297. http://dx.doi.org/10.1111/ejss.12028. Hengl, T., Jesus, J.M.d., Heuvelink, G.B.M., Gonzalez, M.R., Kilibarda, M., Blagotić, A., Shangguan, W., Wright, M.N., Geng, X., Bauer-Marschallinger, B., et al., 2017. SoilGrids250m: global gridded soil information based on machine learning. PLOS ONE 12, e0169748. http://dx.doi.org/10.1371/journal.pone.0169748. Hengl, T., Leenaars, J.G.B., Shepherd, K.D., Walsh, M.G., Heuvelink, G.B.M., Mamo, T., Tilahun, H., Berkhout, E., Cooper, M., Fegraus, E., et al., 2017b. Soil nutrient maps of sub-Saharan Africa: assessment of soil nutrient content at 250 m spatial resolution using machine learning. Nutr. Cycl. Agroecosyst. 109, 77–102. http://dx.doi.org/10. 1007/s10705-017-9870-x. Heuvelink, G., 2003. The definition of pedometrics. Pedometron 15, 11–12. Hey, T., Tansley, S., Tolle, K. (Eds.), 2009. The Fourth Paradigm: Data-intensive Scientific Discovery. Microsoft Research. Hirmas, D.R., Giménez, D., Mome Filho, E.A., Patterson, M., Drager, K., Platt, B.F., Eck, D.V., 2016. Quantifying soil structure and porosity using three-dimensional laser scanning. In: Hartemink, A.E., Minasny, B. (Eds.), Digital Soil Morphometrics. Progress in Soil Science. Springer International, pp. 19–36. Hole, F.D., Hironaka, M., 1960. An experiment in ordination of some soil profiles. Soil Sci. Soc. Am. Proc. 24, 309–312. ISO 28258:2013 Soil Quality - Digital Exchange of Soil-related Data. Keesstra, S.D., Bouma, J., Wallinga, J., Tittonell, P., Smith, P., Cerdà, A., Montanarella, L., Quinton, J.N., Pachepsky, Y., vanăderăPutten, W.H., et al., 2016. The significance of soils and soil science towards realization of the United Nations sustainable development goals. SOIL 2, 111–128. http://dx.doi.org/10.5194/soil-2-111-2016. Keyvanshokouhi, S., Cornu, S., Samouelian, A., Finke, P., 2016. Evaluating SoilGen2 as a tool for projecting soil evolution induced by global change. Sci. Total Environ. 571, 110–123. http://dx.doi.org/10.1016/j.scitotenv.2016.07.119. Lagacherie, P., Cazemier, D., van Gaans, P., Burrough, P., 1997. Fuzzy k-means clustering of fields in an elementary catchment and extrapolation to a larger area. Geoderma 77, 197–216. http://dx.doi.org/10.1016/S0016-7061(97)00022-0. Lark, R.M., 2016. Multi-objective optimization of spatial sampling. Spat. Stat. 18, Part B, 412–430. http://dx.doi.org/10.1016/j.spasta.2016.09.001. Makó, A., Tóth, G., Weynants, M., Rajkai, K., Hermann, T., Tóth, B., 2017. Pedotransfer functions for converting laser diffraction particle-size data to conventional values. Eur. J. Soil Sci. 68, 769–782. http://dx.doi.org/10.1111/ejss.12456. Malone, B.P., Minasny, B., Odgers, N.P., McBratney, A.B., 2014. Using model averaging to combine soil property rasters from legacy soil maps and from point data. Geoderma 232–234, 34–44. http://dx.doi.org/10.1016/j.geoderma.2014.04.033. McKenzie, N.J., Ryan, P.J., 1999. Spatial prediction of soil properties using environmental correlation. Geoderma 89, 67–94. Mercer, W.B., Hall, A.D., 1911. The experimental error of field trials. J. Agric. Sci. (Camb.) 4, 107–132. Miller, M.K., 1977. Potentials and pitfalls of path-analysis - tutorial summary. Qual. Quant. 11, 329–346. http://dx.doi.org/10.1007/BF00143958. Minasny, B., McBratney, A.B., 2016. Digital soil mapping: a brief history and some lessons. Geoderma 264, Part B, 301–311. http://dx.doi.org/10.1016/j.geoderma.2015. 07.017. Odeh, I.O., McBratney, A.B., 2001. Estimating uncertainty in soil models (Pedometrics ’99). Geoderma 103, 1. http://dx.doi.org/10.1016/S0016-7061(01)00066-0. Odgers, N.P., McBratney, A.B., Minasny, B., Sun, W., Clifford, D., 2014. DSMART: an algorithm to spatially disaggregate soil map units. In: Arrouays, D., McKenzie, N., Hempel, J., DeForges, A.C.R., McBratney, A. (Eds.), GlobalSoilMap: Basis of the Global Spatial Soil Information System. CRC Press-Taylor & Francis Group, pp. 261–266. Oliver, M.A., Lark, R.M., 2005. Preface: Pedometrics 2003. Geoderma 128, 177–178. http://dx.doi.org/10.1016/j.geoderma.2005.04.001. Opolot, E., Yu, Y.Y., Finke, P.A., 2015. Modeling soil genesis at pedon and landscape scales: achievements and problems. Quat. Int. 376, 34–46. http://dx.doi.org/10. 1016/j.quaint.2014.02.017. Poggio, L., Gimona, A., Spezia, L., Brewer, M.J., 2016. Bayesian spatial modelling of soil properties and their uncertainty: the example of soil organic matter in Scotland using R-INLA. Geoderma 277, 69–82. http://dx.doi.org/10.1016/j.geoderma.2016.04.026. Press, W.H., 1986. Numerical Recipes: The Art of Scientific Computing. Cambridge University Press. Rawlins, B.G., Turner, G., Wragg, J., McLachlan, P., Lark, R.M., 2015. An improved method for measurement of soil aggregate stability using laser granulometry applied at regional scale. Eur. J. Soil Sci. 66, 604–614. http://dx.doi.org/10.1111/ejss. 12250. Rayner, J.H., 1966. Classification of soils by numerical methods. J. Soil Sci. 17, 79–92. http://dx.doi.org/10.1111/j.1365-2389.1966.tb01454.x. Román Dobarco, M., Arrouays, D., Lagacherie, P., Ciampalini, R., Saby, N.P.A., 2017. Prediction of topsoil texture for Region Centre (France) applying model ensemble methods. Geoderma 298, 67–77. http://dx.doi.org/10.1016/j.geoderma.2017.03. 015. Roudier, P., Ritchie, A., Hedley, C., Medyckyj-Scott, D., 2015. The rise of information science: a changing landscape for soil science. IOP Conf. Ser. Earth Environ. Sci. (1755-1315) 25. http://dx.doi.org/10.1088/1755-1315/25/1/012023.
Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.geoderma.2018.03.009. References Angelini, M.E., Heuvelink, G.B.M., Kempen, B., 2017. Multivariate mapping of soil with structural equation modelling. Eur. J. Soil Sci. 68 (5), 575–591. http://dx.doi.org/10. 1111/ejss.12446. Anonymous, 2016. Introduction to the special issues: advances in pedometrics. Geoderma 263, 215. http://dx.doi.org/10.1016/j.geoderma.2015.10.006. Arkley, R.J., 1971. Factor analysis and numerical taxonomy of soils. Soil Sci. Soc. Am. J. 35, 312–315. http://dx.doi.org/10.2136/sssaj1971.03615995003500020038x. Beaudette, D.E., Roudier, P., O’Geen, A.T., 2013. Algorithms for quantitative pedology: a toolkit for soil scientists. Comput. Geosci. 52, 258–268. http://dx.doi.org/10.1016/j. cageo.2012.10.020. Bidwell, O., Hole, F., 1964. An experiment in the numerical classification of some Kansas soils. Soil Sci. Soc. Am. Proc. 28, 263–268. http://dx.doi.org/10.2136/sssaj1964. 03615995002800020039x. Bie, S., Schelling, J., 1978. An integrated information system for soil survey. In: Sadovski, A.N., Bie, S. (Eds.), Developments in Soil Information Systems: Proceedings of the Second Meeting of the ISSS Working Group on Soil Information Systems, Varna/Sofia, Bulgaria, 30 May–4 June 1977. Pudoc, pp. 43–49. Boru̇vka, L., Brus, D., Zhu, A.-X., 2014. Innovations in pedometrics. Geoderma 213, 570. http://dx.doi.org/10.1016/j.geoderma.2013.10.018. Bosma, W.J.P., Marinussen, M.P., van der Zee, S.E., 1994. Simulation and areal interpolation of reactive solute transport. Geoderma 62, 217–231. http://dx.doi.org/10. 1016/0016-7061(94)90037-X. Burrough, P.A., 1991. Soil information systems. In: Geographical Information Systems. vol. 2. Longmans, pp. 153–169. Burrough, P.A., Bouma, J., Yates, S.R., 1994. The state of the art in pedometrics. Geoderma 62, 311–326. http://dx.doi.org/10.1016/0016-7061(94)90043-4. Buurman, P., Sevink, J., 1995. Van bodemkaart tot informatiesysteem. SC-DLO Wageningen Pers (In Dutch). Chaney, N.W., Wood, E.F., McBratney, A.B., Hempel, J.W., Nauman, T.W., Brungard, C.W., Odgers, N.P., 2016. POLARIS: a 30-meter probabilistic soil series map of the contiguous United States. Geoderma 274, 54–67. http://dx.doi.org/10.1016/j. geoderma.2016.03.025. Cochran, W.G., 1937. A catalogue of uniformity trial data. Suppl. J. R. Stat. Soc. 4, 233–253. De Gruijter, J.J., 1999. Preface: pedometrics 1997. Geoderma 89, 7–8. http://dx.doi.org/ 10.1016/S0016-7061(98)00145-1. De Gruijter, J.J., Webster, R., Myers, D.E., 1994. Preface: special issue, proceedings of the first conference of the Working Group on Pedometrics of the International Society of Soil Science (ISSS). Geoderma 62, vii–viii. http://dx.doi.org/10.1016/00167061(94)90021-3. de Gruijter, J., McBratney, A., McSweeney, K., 1997. Preface: fuzzy sets in soil science. Geoderma 77, 83. http://dx.doi.org/10.1016/S0016-7061(97)00016-5. Dobermann, A., 1994. Factors causing field variation of direct-seeded flooded rice. Geoderma 62, 125–150. http://dx.doi.org/10.1016/0016-7061(94)90032-9. Dobermann, A., Oberthür, T., 1997. Fuzzy mapping of soil fertility - a case study on irrigated riceland in the Philippines. Geoderma 77, 317–339. http://dx.doi.org/10. 1016/S0016-7061(97)00028-1. Dongarra, J.J., 1979. LINPACK Users' Guide. Society for Industrial and Applied Mathematics. Eastman, J.R., Fulk, M., 1993. Long sequence time-series evaluation using standardized principal components (reprinted from Photogrammetric Engineering and Remote Sensing, vol 59, pg 991–996, 1993). Photogramm. Eng. Remote. Sens. 59, 1307–1312 (1993). Finke, P.A., 2012. Modeling the genesis of luvisols as a function of topographic position in loess parent material. Quat. Int. 265, 3–17. http://dx.doi.org/10.1016/j.quaint.2011. 10.016. Gray, J., 2009, Oct, Oct. Jim Gray on eScience: a transformed scientific method. In: Hey, T., Tansley, S., Tolle, K. (Eds.), The Fourth Paradigm: Data-intensive Scientific Discovery, pp. xvii–xxxi. Haines, W.B., Keen, B.A., 1925a. Studies in soil cultivation. II. A test of soil uniformity by means of dynamometer and plough. J. Agric. Sci. 15, 387–394. http://dx.doi.org/10.
6
Geoderma xxx (xxxx) xxx–xxx
D.G. Rossiter
review, key challenges, and new perspectives. Vadose Zone J. 15. http://dx.doi.org/ 10.2136/vzj2015.09.0131. Viscarra Rossel, R.A., Behrens, T., Ben-Dor, E., Brown, D.J., Dematt, J.A.M., Shepherd, K.D., Shi, Z., Stenberg, B., Stevens, A., Adamchuk, V., et al., 2016. A global spectral library to characterize the worlds soil. Earth-Sci. Rev. 155, 198–230. http://dx.doi. org/10.1016/j.earscirev.2016.01.012. Viscarra Rossel, R.A., Bouma, J., 2016. Soil sensing: a new paradigm for agriculture. Agric. Syst. 148, 71–74. http://dx.doi.org/10.1016/j.agsy.2016.07.001. Viscarra Rossel, R.A., Lobsey, C.R., Sharman, C., Flick, P., McLachlan, G., 2017. Novel proximal sensing for monitoring soil organic C stocks and condition. Environ. Sci. Technol. 51. http://dx.doi.org/10.1021/acs.est.7b00889. Wang, K., Zhang, C., Li, W., 2013, Aug, Aug. Predictive mapping of soil total nitrogen at a regional scale: a comparison between geographically weighted regression and cokriging. Appl. Geogr. 42, 73–85. http://dx.doi.org/10.1016/j.apgeog.2013.04.002. Webster, R., 1994. The development of pedometrics. Geoderma 62, 1–15. http://dx.doi. org/10.1016/0016-7061(94)90024-8. Yates, S.R., Warrick, A.W., Myers, D.E., 1986. A disjunctive kriging program for two dimensions. Comput. Geosci. 12, 281–313. http://dx.doi.org/10.1016/0098-3004(86) 90037-3. Youden, W.J., Mehlich, A., 1937. Selection of efficient methods for soil sampling. Contributions of the Boyce Thompson Institute for Plant Research 9, 59–70. Zhu, A.-X., Lark, M., Minasny, B., Huang, Y., 2012. Entering the digital world (Pedometrics 2009). Geoderma 171–172, 1–2. http://dx.doi.org/10.1016/j. geoderma.2012.01.005.
Schulte, R.P.O., Creamer, R.E., Donnellan, T., Farrelly, N., Fealy, R., O’Donoghue, C., O’hUallachain, D., 2014. Functional land management: a framework for managing soil-based ecosystem services for the sustainable intensification of agriculture. Environ. Sci. Pol. 38, 45–58. http://dx.doi.org/10.1016/j.envsci.2013.10.002. Suber, P., 2012. Open access. MIT Press Essential Knowledge Series MIT Press. Sun, X.-L., Wu, Y.-J., Wang, H.-L., Zhao, Y.-G., Zhang, G.-L., 2014. Mapping soil particle size fractions using compositional kriging, cokriging and additive log-ratio cokriging in two case studies. Math. Geosci. 46, 429–443. http://dx.doi.org/10.1007/s11004013-9512-z. Temme, A.J.A.M., Vanwalleghem, T., 2016. LORICA - a new model for linking landscape and soil profile evolution: development and sensitivity analysis. Comput. Geosci. 90, 131–143. http://dx.doi.org/10.1016/j.cageo.2015.08.004. van Groenigen, J.W., Stein, A., 1998. Constrained optimization of spatial sampling using continuous simulated annealing. J. Environ. Qual. 27, 1078–1086. Van Meirvenne, M., Goovaerts, P., 2003. Preface: pedometrics 2001. Geoderma 112, 167. http://dx.doi.org/10.1016/S0016-7061(02)00303-8. Vanwalleghem, T., Tarquis, A., Minasny, B., 2018. Advances in pedometrics: special issue of pedometrics 2015, Córdoba. Geoderma 311, 91–92. http://dx.doi.org/10.1016/j. geoderma.2017.05.042. Vereecken, H., Neuendorf, O., Lindenmayr, G., Basermann, A., 1996. A Schwarz domain decomposition method for solution of transient unsaturated water flow on parallel computers. Ecol. Model. 93, 275–289. http://dx.doi.org/10.1016/0304-3800(95) 00224-3. Vereecken, H., Schnepf, A., Hopmans, J., Javaux, M., Or, D., Roose, T., Vanderborght, J., Young, M., Amelung, W., Aitkenhead, M., et al., 2016. Modeling soil processes:
7