Producing marine geophysical archive files from raw underway data

Producing marine geophysical archive files from raw underway data

Computers & Geosciences 133 (2019) 104321 Contents lists available at ScienceDirect Computers and Geosciences journal homepage: www.elsevier.com/loc...

4MB Sizes 0 Downloads 33 Views

Computers & Geosciences 133 (2019) 104321

Contents lists available at ScienceDirect

Computers and Geosciences journal homepage: www.elsevier.com/locate/cageo

Research paper

Producing marine geophysical archive files from raw underway data✩ Michael Hamilton a ,∗, Paul Wessel a , Brian Taylor a , Joaquim Luis b a

Department of Earth Sciences, School of Ocean and Earth Science and Technology, University of Hawai‘i at Manoa, 1680 East-West ¯ Rd., Honolulu, HI 96822, USA b Dept. de Ciências da Terra, do Mar e do Ambiente, Universidade do Algarve, Faro, Portugal

ARTICLE

INFO

Keywords: Algorithms Data processing Marine geophysical trackline data Data assimilation Data structures MGD77

ABSTRACT Preserving costly marine geophysical trackline data is of paramount importance but variability in priorities, funding, personnel, and technology impact our data archival capacity. We have addressed one crucial facet of this dilemma by devising an open source approach to merge and reduce underway geophysical data and to generate marine geophysical archive files using common command line programs along with the Generic Mapping Tools and its mgd77 supplement. Archive files generated using this approach retain full precision and may be converted automatically to MGD77T, MGD77+, as well as MGD77 formats. We successfully applied the approach to 340 geophysical data sets acquired by R/V Kilo Moana from 2002 to 2018 and in the near term we plan to submit the non-proprietary archive files to the National Centers for Environmental Information’s trackline geophysics archive. We encourage international oceanographic communities to explore our methodology as a larger user-base will strengthen the software and the procedures.

1. Introduction Marine geophysical trackline data are widely used to populate bathymetric charts, interpret past plate motions via crustal magnetization studies, delimit maritime boundaries in areas where continental shelves extend beyond 200 nm, explore for mineral resources, interpret seafloor morphology, and for many other purposes. We owe a good deal of our current understanding of Earth processes to geophysical data gathered at sea where many forms of evidence supporting the plate tectonic theory continue to be uncovered. The difficulty and cost associated with acquiring these data requires that we ensure their preservation and this has long been recognized by the scientific community. The National Centers for Environmental Information (NCEI), a division of the U.S. National Oceanic and Atmospheric Administration, maintains dozens of geophysical archives including the world’s largest repository for marine geophysical trackline data. As depicted in Fig. 1, NCEI currently archives some 6130 trackline geophysical data sets dating back to 1939. The data are comprised of single and center-beam depths (Fig. 1-top), towed total field magnetics (Fig. 1-middle), shipboard gravity (Fig. 1-bottom), and seismic shotpoint data (not shown). Although NCEI is an American repository supported by public funds, currently 33.8% of archived surveys source from overseas: 8.5% from the United Kingdom, 6.7% from Japan, 4.2% from both French and Canadian sources, and the remaining ∼10% of surveys source from 16 other nations. An unknown

amount of data collected by other international and private organizations remains on data servers and backup media around the world. NCEI’s willingness to preserve international data is commendable and this service has undoubtedly saved a multitude of data sets, both foreign and domestic, from unintended loss. Among those involved with archival matters, there is perhaps more volatility than is desirable for the ongoing performance of this necessary function. Advances in technology have in the past led to deprecation of previously collected data sets. For example, geophysical data acquired during the GPS era supersedes data navigated celestially. However many of the existing data were collected prior to GPS navigation hence we must archive both types until the outdated portions can be resurveyed. Shifting priorities, budget cuts and personnel changes can also interfere with the archival process by depriving universities and government agencies of their in-house archival expertise and data repositories of sufficient resources to ensure that only high quality, processed data are archived. Given that many source institutions no longer support in-house archive file preparation, there is currently a need for open source methods for the production of marine geophysical archive files from raw underway data. We present such an open source approach here which we developed over the course of our past researches (Wessel and Chandler, 2007; Chandler and Wessel, 2008, 2012; Hamilton et al., 2019) and in addressing marine geophysical data challenges here at

✩ Link to source code: github.com/GenericMappingTools/ship2mgd77. ∗ Corresponding author. E-mail address: [email protected] (M. Hamilton).

https://doi.org/10.1016/j.cageo.2019.104321 Received 10 August 2018; Received in revised form 9 July 2019; Accepted 6 September 2019 Available online 12 September 2019 0098-3004/© 2019 Elsevier Ltd. All rights reserved.

Computers and Geosciences 133 (2019) 104321

M. Hamilton et al.

Fig. 1. Marine geophysical data archived in NCEI’s trackline database include 5245 depth (top), 2965 magnetic (middle), and 2092 gravity tracks (bottom) [Mollweide equal area projection].

2. Methods

the University of Hawaii at M¯anoa. We report these methods here to assist others in their archival tasks and describe their applica-

The primary objective of this methodology is to build an ASCII archive file in GMT’s tabular (DAT) format from which a valid archive file in MGD77T (i.e., the Marine Geophysical Data Exchange Format, Hittleman et al., 2010), MGD77+ (Wessel and Chandler, 2007), or legacy MGD77 format is generated via mgd77convert. The technique is comprised of two constructive components: data record 2.1.1 and header 2.1.2 production, followed by reduction of total field gravity 2.2.1 and magnetic measurements 2.2.2, conversion to MGD77T/ MGD77+/MGD77 2.3 and, lastly, data quality assessment and control 2.4. These steps are sequential and cumulative in that syntactically valid data records are needed for header derivation, a valid, unreduced archive file is required for the reduction of gravity and magnetic data, and an MGD77-formatted archive file is required for quality control. Because the technique will vary considerably for different vessels running different instruments and data structures, we present a generic example, based on R/V Kilo Moana data structures, to be modified and extended as needed.

tion to all R/V Kilo Moana geophysical data collected to date. Some examples of processing commands are given in the text while the archival preparation script, ship2mgd77.sh, and source code, C programs udmerge.c and lopassvel.c, illustrating the production of archive files from raw underway data, are available online at https://github.com/ GenericMappingTools/ship2mgd77. A subset of raw underway geophysical data collected in 2016 by R/V Kilo Moana during the KM1609 survey (Benyshek et al., 2017) is also provided online, for testing purposes, along with additional documentation, at www.soest.hawaii. edu/mgd77. The approach utilizes ubiquitous command line programs (e.g., sed, awk, grep, etc.) and the Generic Mapping Tools (GMT, Wessel et al., 2013) and its mgd77 toolkit (Wessel and Chandler, 2007). GMT source code and installation instructions are available on GitHub at https://github.com/GenericMappingTools/gmt. 2

Computers and Geosciences 133 (2019) 104321

M. Hamilton et al. Table 1 Typical marine geophysical trackline data formats. Input formats Depth𝑟𝑎𝑤 Example Depth𝑐𝑜𝑟𝑟 Example

Time-sequence record-type depth𝑒𝑚122 depth𝑒𝑚710 2016 342 01 55 20 862 dpth 5850.5146 0.00 Time-sequence lat lon depth𝑝𝑖𝑐𝑘 2016 342 01 55 20 862 −6.884271080 −176.276531992 5850.515

Navigation Example

Time-sequence record-type lat lon 2016 342 01 55 20 766 gps −6.884273 −176.276527 2016 342 01 55 21 266 gps −6.884263 −176.276553

Grav𝑟𝑎𝑤 Example

Time-sequence record-type meter-counts error-flag g 𝑟𝑎𝑤 2016 342 01 55 19 863 rbgm3 024165 00 976075.290 2016 342 01 55 20 863 rbgm3 024453 00 977536.380 Time-sequence lat lon g 𝑓 𝑖𝑙𝑡 𝜖 𝐸𝐶 𝛥g 𝑓 𝑎𝑎 2016 342 01 55 20 862 −6.884271080 −176.276531992 978022.080 −80.553 −84.782

Grav𝑟𝑒𝑑𝑢𝑐𝑒𝑑 Example Mag𝑟𝑎𝑤 Example Mag𝑅𝑒𝑑𝑢𝑐𝑒𝑑 Example

Time-sequence record-type mtf mss msd𝑟𝑎𝑤 2016 342 01 55 20 826 magy 35926.055 1567 2.61 2016 342 01 55 20 926 magy 35926.035 1550 2.64 Time-sequence lat lon mtf 𝛥mag diur msd𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 2016 342 01 55 20 862 −6.884271080 −176.276531992 35926.101 −73.952 14.984 6.570 Output formats

DAT Syntax Example M77T Syntax Example

Desired intermediate output format (GMT’s DAT format) drt tz year month day hour decimal-minute lat lon ptc twt depth bcc btc mtf mtf2 𝛥mag msens diur msd g 𝑓 𝑖𝑙𝑡 𝜖 𝐸𝐶 𝛥g 𝑓 𝑎𝑎 nqc cruise-id ssln sspn 5 0 2016 12 7 1 55.347700 −6.884271080 −176.276531992 9 nan 5850.515 99 9 35926.1 nan −73.952 9 14.984 6.57 978022.08 −80.553 −84.782 9 km1609 99999 999999 Desired final output format (NCEI’s MGD77T) id tz date time lat lon ptc nqc twt depth bcc btc bqc mtf1 mtf2 𝛥mag msens diur msd mqc g 𝑓 𝑖𝑙𝑡 𝜖 𝐸𝐶 𝛥g 𝑓 𝑎𝑎 gqc ssln sspn km1609 0 20161207 155.3477 −6.8842711 −176.27653 5850.515 0 35926.1 −73.952 9 14.984 6.57 0 978022.08 −80.553 −84.782 0

Field abbreviations include: time-sequence (year ordinal-day hour minute second millisecond), depth𝑒𝑚122 (EM-122 centerbeam depth), depth𝑒𝑚710 (EM-710 center-beam depth), lat (latitude), lon (longitude), depth𝑝𝑖𝑐𝑘 (chosen depth value), g 𝑟𝑎𝑤 (raw observed gravity), g 𝑓 𝑖𝑙𝑡 (filtered observed gravity), 𝜖 𝐸𝐶 (Eötvös correction), 𝛥g 𝑓 𝑎𝑎 (free-air anomaly), mtf (magnetic total field), mss (magnetic signal strength), msd𝑟𝑎𝑤 (magnetometer raw sensor depth), 𝛥mag (magnetic anomaly), diur (diurnal correction), msd𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 (magnetometer corrected sensor depth), drt (data record type), tz (time zone), ptc (position type code), twt (two-way travel time), bcc (bathymetric correction code), btc (bathymetry type code), mtf2 (magnetic total field, 2nd sensor), msens (sensor for residual field), nqc (navigation quality code), ssln (seismic line number), sspn (seismic shot-point number), bqc (bathymetry quality code), mqc (magnetic quality code), and gqc (gravity quality code). Data values from R/V Kilo Moana cruise KM1609.

be quite large, hence these issues are best dealt with at sea, prior to smoothing (Hamilton et al., 2019). In preparing the archive file, it is possible to introduce extreme data and navigation artifacts by interpolating across temporal and spacial gaps. For instance, extreme speeds and erroneous positions are introduced by interpolating navigation across a large data gap or across a geographic discontinuity such as the Antimeridian or Prime Meridian. Invalid magnetic and gravity measurements can similarly be introduced by interpolating across large data gaps. For this reason, we avoid interpolation across temporal gaps exceeding the filter length (typically three to six minutes for gravity) and we avoid smoothing navigation as well. When navigation smoothing is necessary, interpolation across large gaps is avoided by not interpolating over gaps exceeding the filter width, and interpolation across geographic discontinuities is avoided by switching longitude realms to or from −180/+180 or 0/360 range as needed. Whereas gravity and magnetic data are generally logged in ASCII files containing time and potential field measurements, depth soundings, on the other hand, are predominantly embedded within multibeam sonar files. Many ships will have extraction routines in place but in cases where the center beam depth value and time are not automatically extracted and included in the raw underway data directory, it may be necessary to perform this extraction independently (e.g., using MBSystem: mblist -Otz -I datalist.txt Caress and Chayes, 1995). For vessels running deep and shallow multibeam sonars simultaneously, center beam depths for both are extracted then the correct depth column is selected in processing. Sometimes the out-of-range sonar outputs values that cannot be distinguished from valid depth measurements automatically. This ambiguity could be a significant source of error in

2.1. Archive file assembly 2.1.1. Data records In converting disparate navigation, depth, magnetic, and gravity data sets into a valid exchange format file we must merge the various data files with navigation using time as the index field, as depicted in Table 1. This is because typically only the navigation file contains both time and position. The rest of the raw geophysical data files usually contain time with geophysical measurements. Because different instruments have different sampling rates and sampling times often differ from satellite navigation time stamps, we must be careful in assembling the merged archive file. This process of course assumes that time is properly synchronized within all logged data files. Errors do occur within geophysical acquisition systems and also can be introduced during their transmission to and from processing and logging systems, hence every navigation and geophysical record needs to be checked for adherence to chronologic order, required number of fields, and validity of observables. Gravity and magnetic fields measured at the sea surface are naturally attenuated by their distance from the source, thus in general vary more gradually than topography. However, due to the pronounced effect of vessel motion and sea state on the measurement of gravity at sea, smoothing of gravity data is imperative, especially when no hardware smoothing is applied. Regarding magnetics, special care must be taken with magnetic data prior to smoothing. Data logged during deployment and recovery, for instance, or which suffer from low signal strength, need to be discarded along with data collected during strong geomagnetic storms. Future data users will not typically be aware of such occurrences and the magnitudes of such erroneous readings can 3

Computers and Geosciences 133 (2019) 104321

M. Hamilton et al.

files in GMT DAT format, which is identical to MGD77 and can be converted later, when it is combined with data records, to MGD77T. Upon completion, we concatenate the finished header (see Table 3) with the data records (e.g., cat cruisehdr.h77 cruisedata.dat > cruiseid.dat ) and the draft archive file is complete. We do not assume that the data are fully processed at this point. Downloaded data sets may require quality control via errata-based correction (e.g., Chandler and Wessel, 2012) or other methods whereas new data sets require anomaly reduction (Section 2.2), conversion (Section 2.3) and thorough scrutiny (Section 2.4).

geophysical archive files and could be avoided, for example, by logging from one sonar at a time. When ASCII files for each acquired geophysical field contain the respective geophysical measurements, time, and position for each record, they can be merged, listed sequentially, and formatted into a single ASCII file. For simplicity, we arrange data into GMT’s tabular DAT format (e.g., Table 1) as an intermediate step. The DAT file can then be converted to MGD77T and MGD77+ using mgd77convert (Wessel and Chandler, 2007). We prefer using GMT’s DAT format as an intermediate text output format because the structure is fixed and all columns contain either values or are NaN-filled, hence it is easy to generate and read. MGD77T is more complex in that empty fields are denoted by tab characters causing the structure to appear differently depending on the data fields present. Although MGD77T is more difficult to read and write, it attains significant file size reduction relative to GMT DAT due to tab characters taking up much less space than ‘‘NaN’’ strings. Therefore, MGD77T is preferable as the final output format for the purpose of exchanging geophysical trackline data with the scientific community. When utilizing existing data, however, it is often preferable to convert data to MGD77+, GMT’s NetCDF-based exchange format, which reduces file size even further and in which errata flags can be stored and applied to extract corrected data on-the-fly. Regarding file sizes among the various exchange formats, MGD77+ files are by far the most efficient, followed by MGD77T and MGD77. GMT DAT files are the easiest to construct but are too large to be used for data exchange, hence this format is a useful intermediate format for archive file preparation. For instance, if all depth measurements are included and potential field data are digitized at 15 s intervals for R/V Kilo Moana’s KM1609 survey, file sizes are 14 MB for MGD77+, 30 MB for MGD77T, 40 MB for MGD77, and 49 MB for GMT DAT. We refer the reader to the MGD77T format specification (Hittleman et al., 2010) for more details about this and the several additional metadata fields and flags supported by MGD77T’s data record format, including seismic shot point and line numbers along with navigation quality codes and others. We also refer the reader to Wessel and Chandler (2007) for more information regarding the MGD77+ and GMT DAT formats. The original and Y2K-compliant MGD77 formats (i.e., Hittleman et al., 1977, and subsequent revisions) are supported by these methods but due to their precision limitations, poor readability and extra data I/O requirements (since most fields are stored in tenths of units with no field delimiters), these legacy formats are no longer preferred for new data.

2.2. Reduction of gravity and magnetic data 2.2.1. Gravity reduction We assume as a starting point that gravity meter readings are already converted to mGal and that the meter will be properly tied in to the absolute gravity network. Therefore, gravity data will not be final until the closing tie-in is carried out and the data are corrected for any drift that may have occurred. Such drift is typically small, e.g., less than ∼0.09 mGal/day for the Bell Aerospace BGM-3 meter and less than 0.02 mGal/day for the LaCoste & Romberg Model ‘‘G’’ meter according to manufacturers, hence gravity monitoring efforts undertaken throughout the survey should not be invalidated when correcting for meter drift. Regarding gravity tie procedures, Bell and Watts (1986) provided useful gravity tie examples and analyzed BGM-3 drift rate and accuracy. If we are to be able to properly tie in logged gravity data to the absolute gravity network and to be adequately familiar with gravimeter functionality such that malfunctions may be quickly detected, gravity ties must be performed rigorously and consistently. Failure to properly tie in gravity meters is perhaps one of the leading reasons that marine gravity obtained by the academic fleet is being considered less accurate than satellite-derived gravity (Sandwell et al., 2013). Of course, marine gravity meters need to be corrected for vessel motion as well. To compute the required Eötvös correction (𝜖𝐸𝐶 ), so that vessel motion artifacts can be accounted for, we apply the standard formula (i.e., 𝜖𝐸𝐶 = 7.5𝜐 cos 𝜙 sin 𝛼 + 0.004𝜐2 mGal, Dehlinger, 1978) using mgd77list (e.g., mgd77list cruiseid −A+f8 -Ftime,gobs,ceot,faa). The computed Eötvös correction is then subjected to the same Gaussian filter as was applied to observed gravity. Per MGD77T specifications (Hittleman et al., 1977), observed gravity must be corrected for Eötvös, drift, and tares and be reported in mGal. After accounting for these effects, freeair anomalies are then calculated using mgd77list, as observed plus Eötvös correction minus normal gravity (i.e., mgd77list cruiseid −A+f4 -Ftime,gobs,eot,faa). The format allows for reporting Eötvös calculations, but in practice it is seldom done, perhaps because it is already applied to the observed gravity data and can be recovered if observed and free-air gravity as well as the normal gravity formula (typically IGF80) are given. Because models of Earth’s normal gravity formulae are occasionally modified, a best practice is to include observed gravity in the archive file, so that free-air anomalies may be recalculated in future studies. No further gravity reduction is supported within the MGD77T format.

2.1.2. Header An archive file is not complete without a valid metadata header. The along-track and crossover analysis tools require that each archive file has valid header and data records. The header reports information applying to the overall survey, such as source institution name, archive file version, descriptions of instrumentation, chief scientist’s name, departure and arrival dates and ports, along with sampling and digitizing rates, among other metadata. For descriptions of the 58 header fields, the reader is again referred to the MGD77T specification. When downloading geophysical trackline data from NCEI, we choose data records organized into individual files of complete surveys. To utilize such existing data sets we first concatenate the headers with their respective data record files. When dealing with newly acquired data sets, we create a header customized to the current survey using the mgd77header program, which was developed for this research and is also distributed with GMT. The program accepts values from a parameter file (e.g., Table 2), which we fill with the pertinent static parameters (e.g., vessel name, chief scientist name, etc.), and it also computes the values of data-dependent header parameters such as temporal and geographic survey extents, the ten by ten degree bins crossed during the survey, which geophysical fields were detected in the data records, etc., from the data. For simplicity, we produce header

2.2.2. Magnetic reduction This passage applies to total field magnetometers towed behind a vessel typically at distances of ∼2.5 times vessel length. Reduction of three component magnetic data is considerably more involved and is still under active development by the geophysical community (e.g., Isezaki, 1986; Korenaga, 1995; Honsho et al., 2013). An archive file should contain raw total field measurements as well as the residual magnetic anomaly calculated using the latest geomagnetic reference field and should be corrected for diurnal variation if possible. Chandler and Wessel (2012) reported that 11% of magnetic surveys archived at NCEI lack total field magnetic measurements. Because magnetic anomalies need to be recomputed when definitive, rather than predictive, models of Earth’s geomagnetic field become available, scientists 4

Computers and Geosciences 133 (2019) 104321

M. Hamilton et al. Table 2 Building the metadata header: Sample input parameter file.

Survey_Identifier km1609 Chief_Scientist P Wessel Source_Institution SOEST University of Hawaii Country USA Platform_Name Kilo Moana Platform_Type_Code 1 Platform_Type SHIP Project_Cruise_Leg Test Data KM1609 Day 342 Port_of_Departure Pago Pago, American Samoa Port_of_Arrival Pago Pago, American Samoa Navigation_Instrumentation DGPS Geodetic_Datum_Position_Determination_Method WGS84/SATELLITE Bathymetry_Instrumentation Kongsberg EM122/EM710 Magnetics_Instrumentation Geometrics Cesium Mag G-882 Gravity_Instrumentation Bell Aerospace BGM-3 Bathymetry_Digitizing_Rate 99 Bathymetry_Sampling_Rate 1/PING Bathymetry_Assumed_Sound_Velocity 1500 Bathymetry_Datum_Code 00 Bathymetry_Interpolation_Scheme EXTRACTED FROM MOST VERTICAL SWATH BEAM NO INTERPOLATION Magnetics_Digitizing_Rate 0 Magnetics_Sampling_Rate 01 Magnetics_Sensor_Tow_Distance 250 Magnetics_Ref_Field_Code 88 Magnetics_Ref_Field IGRF-15 Magnetics_Method_Applying_Res_Field IGRF/RES FIELD COMPUTED PER RECORD BY MGD77LIST Gravity_Digitizing_Rate 0 Gravity_Sampling_Rate 0 Gravity_Theoretical_Formula_Code 4 Gravity_Theoretical_Formula IAG SYSTEM 1980 Gravity_Reference_System_Code 3 Gravity_Reference_System SYSTEM IGSN 71 Additional_Documentation_3 OBSERVATION LOCATIONS INTERPOLATED FROM GPS VIA LINEAR INTERPOLATION Additional_Documentation_6 MAGNETICS 60 S GAUSSIAN FILTER, DIGITIZED AT DEPTH TIMES Additional_Documentation_7 GRAVITY 360 S GAUSSIAN FILTER, DIGITIZED AT DEPTH TIMES

Table 3 Preliminary header in MGD77/GMT DAT format generated by mgd77header.

4km 1609 MGD 77km 1609 5551120190712 SOEST University of Hawaii USA Kilo Moana 1SHIP P Wessel Test Data KM 1609 Day 342 NSF 20161207 Pago Pago , American Samoa 20161207 Pago Pago , American Samoa DGPS WGS 84/ SATELLITE Kongsberg EM 122/ EM 710 Geometrics Cesium Mag G -882 Bell Aerospace BGM -3

01 02 03 04 05 06 07 08 09 10 -6 -8 -178 -175 11 99 1/ PING 1500 00 EXTRACTED FROM MOST VERTICAL SWATH BEAM NO INTERPOLATION 12 0 01250 88IGRF -15 IGRF/RES FIELD COMPUTED PER RECORD BY MGD 77 LIST 13 0 004 IAG SYSTEM 1980 3 SYSTEM IGSN 71 14 15 01 5017 ,9999 , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ,16 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 17 18 19 OBSERVATION LOCATIONS INTERPOLATED FROM GPS VIA LINEAR INTERPOLATION 20 21 22 MAGNETICS 60 S GAUSSIAN FILTER , DIGITIZED AT DEPTH TIMES 23 GRAVITY 360 S GAUSSIAN FILTER , DIGITIZED AT DEPTH TIMES 24 Note: Preliminary header compiled from Table 2 parameters and data via mgd77header km1609.dat -Hparams.txt -Mr.

reporting preliminary magnetic anomalies without total field data are potentially hindering the future utility of their data. However, the reduction of magnetic data at sea, though preliminary, can be accomplished easily using GMT’s mgd77list program (e.g., mgd77list cruiseid −A+m2 -Ftime,mtf1,mag ). At sea we are usually not able to determine diurnal variation so the diurnal correction is often omitted from the first generation archive file. It is typically determined post-cruise from a nearby geomagnetic observatory or from comprehensive magnetic models such as CM4 and CM5 (Sabaka et al., 2004, 2015). For example, to compute diurnal variation from the CM4 model via GMT’s mgd77 tools: mgd77magref cruiseid_xytime.d -Frt/3456. Although the

CM4 model is no longer valid for modeling secular drift of the main geomagnetic field and CM5 has not yet been implemented in GMT, mgd77magref can compute the mainfield and external components from the current IGRF model and from the Dst and F10.7 geomagnetic disturbance indices, respectively. When the definitive geomagnetic reference field is available, magnetic anomalies should be recalculated and the outdated archive file should be replaced. Outdated files are distinguished from new submissions via the NCEI-assigned file creation date; digital object identifiers are not yet assigned to individual archive files. 5

Computers and Geosciences 133 (2019) 104321

M. Hamilton et al.

2.3. Conversion to MGD77 formats

4. Discussion

Under this methodology, proper preparation of DAT files is an archivist’s primary challenge as syntax errors will likely invalidate converted archive files. For example, the DAT file should be checked for accuracy prior to conversion and, after conversion to an MGD77x format, should be identical to a reverse-converted DAT file. Conversion from GMT DAT to MGD77T, MGD77+, and MGD77 formats is accomplished as follows: MGD77T, mgd77convert -Ft -Tm; MGD77+, mgd77convert -Ft -Tc; and MGD77, mgd77convert -Ft -Ta. To further demonstrate, our GMT DAT and MGD77T files pertaining to day 342 of the KM1609 survey are included within the sample data, as formatting aids.

Although this methodology was initially developed to prepare existing marine geophysical trackline data for archival, it is equally compatible with archive preparation while at sea. There are numerous advantages associated with compiling the archive file in real-time, while at sea, and reiterating the process as new data are logged. At sea, there is typically a concentration of qualified personnel and ample time for doing the work, without the usual distractions we encounter on shore. Furthermore, a complete archive file can be created by survey end in a straightforward manner by first compiling a first generation, unreduced archive file containing total field measurements and then computing and filtering gravity and magnetic anomalies. Then the second generation (reduced and filtered) preliminary archive file can be scrutinized using along-track analysis to detect extreme errors, such as erroneous anomaly calculations, navigation problems, and excessive offsets from global bathymetry and gravity grids. By iteratively updating the preliminary archive file as new data come in, examining it along-track, and by inspecting it visually throughout the survey, extreme errors can be detected and removed and the complete archive file can be considered to be fully scrutinized. Also, the same archive file can then be used both by the scientist for interpretation as well as submitted to the data repository for preservation and distribution. In preparing marine geophysical archive files, one is faced with various filtering and resampling options. We need to consider the extent to which filtering is applied to best remove outliers (which varies according to sea state and instrumentation), what digitization rate is appropriate for the chosen filter, and importantly that the archive file should contain full resolution so that future studies are not hampered by unnecessary decimation. For the latter reason, and given modern data storage capacities and the assumed importance of each measurement, preserving full bathymetric resolution is a noncontroversial best practice. But in regards to gravity and magnetic data handling, which typically have sampling rates of 1 Hz or higher, the process is not so straightforward. For example, it seems impractical to digitize heavily filtered gravity data at 1 Hz or to digitize lightly filtered magnetic data once every five minutes. Typically, a variable length Gaussian filter, ranging from three to six minutes in length depending on sea state, navigation quality, and characteristics of a particular meter, is applied to marine gravity measurements and vessel motion corrections (e.g., Bell and Watts, 1986). Shorter wavelength filtering may be required for magnetics as well to remove outliers. For this study, we implemented two distinct data digitization approaches. One retains full bathymetric resolution while re-sampling gravity and magnetics at fixed digitization intervals. This is accomplished by resampling the navigation file, which typically gives position at 1 s intervals, at each depth measurement time and at the desired resampling rates for gravity and magnetic data, using GMT’s sample1d program, then listing and merging data records into archive format as described in Section 2. The second approach involves resampling the navigation file at each depth measurement time (to preserve full bathymetric resolution), such that a table consisting of all depth measurement times is read in by the sample1d program. Magnetics and smoothed gravity are then re-sampled at each depth measurement time so that the output archive file contains all three geophysical fields in a single record, subject to data availability. Using both approaches, instrument position is estimated at each measurement time, should they occur between navigation fixes, using linear interpolation (e.g., sample1d -Fl -Ndepth-times.d ...). We contrast the two digitizing approaches by considering R/V Kilo Moana cruise KM0912, which acquired multibeam and gravity data as it passed over both young and old seafloor and two trenches on its way from Guam to Fiji in 2009 (see Fig. 3). If we digitize these LaCoste & Romberg gravity data at a fixed interval of, for example, 15 s, which is comparable to the median sonar ping rate for that cruise, the resultant MGD77T archive has 82,892 data records with a file

2.4. Data quality assessment and control Once the archive file is constructed and anomalies are reduced, another essential component of preparing the data set for archival remains — data quality assessment and control. Items requiring attention include adherence to the exchange file specification, adequacy of the archive header, validity of the included geophysical data, presence of extreme errors, and adequacy of data digitization. The archivist ensures that header and data records fully characterize field observations and are sufficiently populated for wide-ranging subsequent use while data and format errors can be detected using existing methods including along-track (Chandler and Wessel, 2008, 2012) and across-track (e.g., Wessel, 2010) analyses as well as via data visualization techniques such as the Google Earth-based techniques of Hamilton et al. (2019). Along-track analysis was designed specifically to detect a wide array of errors within individual trackline archive files. Briefly, tests include excessive speed detection, navigation passing over land, time zone crossing errors, excessive data gradients, unrealistic data values, extended offsets and systematic differences from satellite-derived gravity and global predictive bathymetry grids, disagreement between reported and recomputed magnetic and gravity anomalies, among a host of other tests. Across-track analysis, which finds and optionally removes discrepancies between measurements at ship track intersections, is often applied to potential field data having numerous crossing lines and to collective analyses involving many dozens or hundreds of surveys, hence it is only applicable given sufficient intersections. Along-track analysis can detect extreme errors directly, within individual tracks, hence we apply the method prior to across-track and other analyses. We refer the reader to the cited studies for further descriptions of quality assessment techniques. Regarding quality control of preliminary archive files, we strive to correct extreme errors to avoid burdening subsequent data users with unwanted errors. Although along-track analysis provides for data correction using its errata correction system (e.g., Chandler and Wessel, 2012), the errata system is intended for the correction of historic data sets without altering the archive file. Archivists have access to raw data and are able to correct data problems hence it would be nonsensical to produce permanent errata correction tables for new archive files. However, temporary errata tables quickly summarize errors detected via along-track analysis and, hence, are of use to the data archivist. 3. Application & results Using the methods described above we produced 340 archive files from raw underway geophysical data gathered by R/V Kilo Moana from 2002 to 2018. LaCoste & Romberg (2002–2011) and subsequently Bell BGM-3 gravity (2012–2018) and Geometrics G-882 magnetics, when available, were resampled at 15 s intervals. Full depth resolution was retained for center beam depths extracted from EM120 (2002–2009) and EM122 (2010–2018) sonars and from EM1002 (2002–2011) and EM710 (2012–2018) sonars when depths did not exceed 500 m. The locations of center-beam depth, magnetic and gravity measurements collected during these surveys are shown in Fig. 2; we are currently in the process of submitting the non-proprietary archive files to NCEI. 6

Computers and Geosciences 133 (2019) 104321

M. Hamilton et al.

Fig. 2. Navigation tracks pertaining to non-proprietary archive files prepared in this study for R/V Kilo Moana include 314 depth (top), 61 magnetic (middle), and 300 gravity tracks (bottom) [Mercator projection]. For each field, measurements and their locations were extracted from the pertinent archive file as follows: mgd77list cruiseid -Flon,lat,field,‘field!=NaN’, where cruiseid is a specific archive file name (e.g., km1609) and field is either depth, mag, or faa, and NaN-filled (empty) records were omitted. Cruise tracks were plotted using psxy and data gaps exceeding 10 km were not drawn to avoid connecting tracks where data were not collected.

size of 6.6 MB. By digitizing gravity data at depth times for the same survey, the output MGD77T archive file has 34,208 data records and a file size of 3.2 MB. The former archive file is larger, mainly, because sonar ping times rarely coincide with fixed digitization times of gravity and magnetics and because the potential field data are digitized more frequently in deeper water than occurs with the latter approach. These results indicate that, by resampling potential field data to depth times, archive files gain efficiency due to coincident reporting of all available geophysical data in each data record and by digitizing more potential field data where needed (i.e., in shallow settings) and less in deep settings, where such high potential field data density is not needed. Because we choose to retain all depth soundings, file size will still be larger, even in exclusively shallow settings, if using a fixed digitization rate due to non-coincident reporting of depth and potential field data.

Hence, if retaining all depths and potential field data were collected, the approach by which potential field data are resampled to depth times is more efficient in both deep and shallow settings. Regarding the fidelity of potential field data resampled at depth times, digitizing at sonar ping intervals of 3 s and 25 s in shallow and deep settings, respectively (e.g., Fig. 3, right panel), or 0.33 Hz and 0.04 Hz, implies Nyquist frequencies of 0.167 Hz and 0.02 Hz and resolvable periods of 6 s and 50 s, which translates to spatial wavelengths of 36 m and 300 m at a typical survey speed of 6 ms (∼ 11.7 knots). In theory, there should be no degradation of the naturally attenuated signals at these digitization rates, according to Shannon’s theorem (Bracewell, 1978). Conversely, it could be possible when using a fixed digitization rate to inadequately characterize potential fields in shallow settings, depending on depth; this could occur frequently in 7

Computers and Geosciences 133 (2019) 104321

M. Hamilton et al.

Fig. 3. Kongsberg EM-120 sonar ping intervals for R/V Kilo Moana’s KM0912 expedition crossing the Mariana and Vitiaz trenches enroute from Apra, Guam to Suva, Fiji in 2009. Ping intervals predominate between 3 and 25 s (left). Ping interval increases with depth from ∼9 s at 1000 m depth to ∼24 s for the 3000 to 11,000 m depth range (right). Median ping rate and depth for cruise KM0912 are 16 s and 2506 m, respectively.

CRediT authorship contribution statement

shallow surveys and when leaving and returning to port. By digitizing potential field data at depth times, a higher digitizing rate is achieved in shallower areas where sonar ping rate is higher (Fig. 3, right panel) and where gravity and magnetic signals should be more frequently digitized due to closer proximity to the source. Such an approach has been taken in the past: We determined that ∼32% of surveys reporting depth, gravity, and magnetics in the NCEI trackline archive report depth and gravity or depth and magnetics in the same records at least 90% of the time. Although resampling gravity and magnetics to depth times yields archive files that retain full bathymetric resolution and sufficient potential field data in deep and shallow settings at a minimized file size, the MGD77 format specifies fixed digitization rates for potential field data, which might need to vary on a per cruise basis depending on water depth.

Michael Hamilton: Conceptualization, Methodology, Software, Writing — original draft, Writing — review & editing, Data curation. Paul Wessel: Funding acquisition, Methodology, Software, Writing — review & editing. Brian Taylor: Project administration, Funding acquisition, Methodology, Writing — review & editing. Joaquim Luis: Writing — review & editing. Acknowledgments Funding was provided in part by the University of Hawai‘i at M¯anoa, United States’s School of Ocean and Earth Science and Technology and by U.S. National Science Foundation grants 1458964 and 1558403. J. Scott Ferguson, Joyce Miller and Alexander Shore facilitated access to R/V Kilo Moana’s raw underway geophysical data. Three anonymous reviewers provided critical reviews. This is SOEST publication number 10791.

Because of this subjectivity in digitizing approaches and the potential for inappropriate filtering and/or digitization of potential field data, it seems desirable that the MGD77T format be extended to also store unfiltered, raw measurements at their original resolution. Although this is not currently possible using MGD77/MGD77T formats, it is possible to add such raw data columns to MGD77+ files. Hence, due to MGD77+’s extendability, built-in support for errata correction flags, and the fact that MGD77+ files are ∼2x smaller than MGD77T files, it would appear to be a suitable alternative to MGD77T for data exchange and archival purposes. Being NetCDF-based, it is also portable across all operating systems.

References Bell, R.E., Watts, A.B., 1986. Evaluation of the BGM-3 sea gravity meter system onboard R/V Conrad. Geophysics 51, 1480–1493. Benyshek, E., Wessel, P., Chandler, M.T., Taylor, B., Wright, N.M., Hellebrand, E., Davidson, P., Koppers, A., 2017. Mapping the Ellice Basin: Preliminary results from cruise KM1609. In: 113th GSA Cordilleran Section Annual Meeting. Honolulu, USA. Bracewell, R.N., 1978. The Fourier Transform and Its Applications. McGraw-Hill, London. Caress, D.W., Chayes, D.N., 1995. New software for processing sidescan data from sidescan-capable multibeam sonars. In: Proceedings of the OCEANS Conference. MTS/IEEE. Chandler, M.T., Wessel, P., 2008. Improving the quality of marine geophysical track line data: Along-track analysis. J. Geophys. Res. 113. Chandler, M.T., Wessel, P., 2012. Errata-based correction of marine geophysical trackline data. Geochem. Geophys. Geosys. 13, http://dx.doi.org/10.1029/ 2012GC004294. Dehlinger, P., 1978. Marine Gravity. Vol. 22. Elsevier Scientific Publishing Company, p. 322. Hamilton, M., Wessel, P., Luis, J., Taylor, B., Ko, Y.T., 2019. The seagoing scientist’s toolbox: Integrated methods for archival and quality control of marine geophysical data at sea. Geochem. Geophys. Geosyst. http://dx.doi.org/10.1029/ 2018GC007891. Hittleman, A.M., Groman, R.C., Haworth, T.L., Holcombe, T.L., McHendrie, S.M., 1977. The Marine Geophysical Data Exchange Format MGD77 - MGD77 - Key to Geophysical Records Documentation, Vol. 10. National Geophysical Data Center, Boulder, Colorado, USA.

5. Computer code availability Shell script ship2mgd77.sh and C programs udmerge.c and lopassvel.c were written by Michael Hamilton (see title page for contact information) and are available for download at https://github.com/ GenericMappingTools/ship2mgd77. The code can be executed under any operating system supporting the Bourne or bash shell, the compilation of C programs, and the installation of GMT and its supporting programs. Compilation instructions for C programs are provided within the source code. Instrumentation and data structures may vary from ship to ship and year to year. We are also obliged to update the code as the underlying command line tools evolve. Some modification of the code may be required in order to achieve the desired result. 8

Computers and Geosciences 133 (2019) 104321

M. Hamilton et al.

Sabaka, T.J., Olsen, N., Tyler, R.H., Kuvshinov, A., 2015. CM5, a pre-swarm comprehensive geomagnetic field model derived from over 12 yr of CHAMP, Ørsted, SAC-C and observatory data. Geophys. J. Int. 200, 1596–1626. Sandwell, D.T., Garcia, E., Soofi, K., Wessel, P., Chandler, M.T., Smith, W.H.F., 2013. Towards 1 mGal global marine gravity from CryoSat-2, Envisat, and Jason-1. Lead. Edge 32, 892–899. Wessel, P., 2010. Tools for analyzing intersecting tracks: The x2sys package. Comput. Geosci. 36, 348–354. Wessel, P., Chandler, M.T., 2007. The mgd77 supplement to the Generic Mapping Tools. Comput. Geosci. 33, 62–75. Wessel, P., Smith, W.H.F., Scharoo, R., Luis, J., Wobbe, F., 2013. Generic Mapping Tools: Improved version released. In: Eos Trans. AGU, pp. 409–420.

Hittleman, A.M., Groman, R.C., Haworth, T.L., Holcombe, T.L., McHendrie, S.M., 2010. The Marine Geophysical Data Exchange Format MGD77 - MGD77T/Legacy MGD77 - Key to Geophysical Records Documentation 10 (Revised). National Geophysical Data Center, Boulder, Colorado, USA. Honsho, C., Ura, T., Kim, K., 2013. Deep-sea magnetic vector anomalies over the Hakurei hydrothermal field and the Bayonnaise knoll caldera, Izu-Ogasawara arc, Japan. J. Geophys. Res. 118, 5147–5164. Isezaki, N., 1986. A new shipboard three-component magnetometer. Geophysics 51, 1992–1998. Korenaga, J., 1995. Comprehensive analysis of marine magnetic vector anomalies. J. Geophys. Res. 100, 365–378. Sabaka, T.J., Olsen, N., Purucker, M.E., 2004. Extending comprehensive models of the Earth’s magnetic field with Ørsted and CHAMP data. Geophys. J. Int. 159, 521–547.

9