Towards a digital infrastructure for engineering materials data

Towards a digital infrastructure for engineering materials data

G Model ARTICLE IN PRESS MD 8 1–12 Materials Discovery xxx (2016) xxx–xxx Contents lists available at ScienceDirect Materials Discovery journal h...

2MB Sizes 1 Downloads 48 Views

G Model

ARTICLE IN PRESS

MD 8 1–12

Materials Discovery xxx (2016) xxx–xxx

Contents lists available at ScienceDirect

Materials Discovery journal homepage: www.elsevier.com/locate/md

Towards a digital infrastructure for engineering materials data

1

Q1

2

Tim Austin European Commission, Joint Research Centre (JRC), Institute for Energy and Transport (IET), Innovative Technologies for Nuclear Reactor Safety Unit, Westerduinweg 3, 1755 LE Petten, Netherlands

3 4 5

6 19

a r t i c l e

i n f o

a b s t r a c t

7 8 9 10 11

Article history: Received 16 December 2015 Accepted 18 December 2015 Available online xxx

12

18

Keywords: Engineering materials Data formats Standardization Data citation Interoperability

20

1. Introduction

13 14 15 16 17

The industrial and research sectors make significant investments in developing and producing engineering materials. These materials are manufactured and qualified in accordance with a body of product and testing standards that have evolved over many decades to meet ever more demanding requirements. Yet the very significant volume of data that result from these activities remains largely unavailable. While efforts to establish a digital infrastructure for engineering materials data can be traced back more than three decades, widespread adoption of machine-readable formats to enable the routine transfer of engineering materials data has yet to be realized. Given the reliance on electronic systems in all aspects of engineering materials development, manufacture, and qualification, it is an anomaly that simply preserving and transferring engineering materials data remains an issue. This anomaly is accentuated by the fact that in recent years other business domains have benefited from the integration of web technologies into established business models. To address these shortcomings a digital infrastructure is needed that allows and encourages the seamless transfer of engineering materials data between different systems. It is in this context that renewed efforts to develop standard formats for engineering materials data are taking place in the frame of CEN Workshops. While building on prior activities at ASTM and ISO, this work leverages existing product and testing standards with a view to engaging the engineering materials community. With preliminary technical specifications having already been demonstrated to streamline the data transfer process, attention is turning to the long-standing challenge of promoting a culture of data sharing. Whereas previously the motivations for researchers and industrial organizations to share data were lacking, the initial impacts of the DataCite framework for data citation on the utilization of the European Commission materials database hosted at http://odin.jrc.eu.europa.eu are suggestive of a sea-change in data sharing and reuse. This paper describes the status of the work to develop data formats for engineering materials in the frame of CEN Workshops and reports on the added value of data citation beyond simply ensuring that data creators are properly accredited for their work. It also reports the outcome of work to enable the European Commission materials database to support standards compliant data formats and data citation, whereby the barriers to systems integration have been considerably reduced and, irrespective of the level of confidentiality, organizations in both the industrial and research sectors now routinely enable their data sets for citation. Together with recent innovations in digital publishing, a renewed interest in the development of standards for engineering materials data offers new prospects for discovery, exchange, and reuse of engineering materials data. Taken along with other data centric initiatives, such as physics-based and multi-scale modelling, Open Data, and linked data, it can reasonably be argued that standard data formats and data publishing herald a transition towards a digital infrastructure for engineering materials data. © 2016 Published by Elsevier Ltd.

Q2 21 22 23

The drivers for conserving and exchanging data are no longer confined to the vested interests of individuals and organizations wanting to preserve their intellectual and financial investments in

E-mail address: [email protected]

data creation activities. Instead, government, institutional, funding agency, and publisher policies demand more responsible approaches to data management. Coupled with increased opportunities for innovation made possible by the web and technologies for publishing and transferring data, organizations and communities that fail to adopt business practices that leverage Web-connectivity risk losing advantage. Although technologies for engineering materials data exist, to date they have largely failed to find widespread

http://dx.doi.org/10.1016/j.md.2015.12.003 2352-9245/© 2016 Published by Elsevier Ltd.

Please cite this article in press as: T. Austin, Towards a digital infrastructure for engineering materials data, Mater. Discov. (2016), http://dx.doi.org/10.1016/j.md.2015.12.003

24 25 26 27 28 29 30 31

G Model MD 8 1–12

ARTICLE IN PRESS T. Austin / Materials Discovery xxx (2016) xxx–xxx

2

44

adoption [1]. Similar failings with semantic technologies in the life sciences suggest that the failure to achieve widespread adoption can be traced to insufficient engagement with the broader community [2]. There is thus a strong case for revisiting the development of technologies for engineering materials data with a view to ensuring the wider engineering materials community is properly engaged. The task at hand is thus to establish a digital infrastructure for engineering materials data that will allow organizations to compete effectively in a digital market. This paper aims to demonstrate that in the context of its testing and product standards there exists a robust framework on which to develop data formats and that in combination with data citation, there are very real opportunities to gain added value.

45

1.1. European policy

32 33 34 35 36 37 38 39 40 41 42 43

60

In the scope of its nuclear safety standardization interests, the Joint Research Centre of the European Commission supports the development of standard formats for engineering materials data and the use of these data formats for efficient storage and transfer of nuclear materials information [3]. More generally, the European Commission is also committed to ensuring that data resulting from publicly funded research become publicly accessible, usable and re-usable through digital e-infrastructures; that datasets are made easily identifiable and can be linked to other datasets and publications through appropriate mechanisms, and additional information is provided to enable their proper evaluation and use; and that institutions responsible for managing public research funding and academic institutions that are publicly funded assist in implementing national policy by putting in place mechanisms enabling and rewarding the sharing of research data [4,5].

61

1.2. Research data management

46 47 48 49 50 51 52 53 54 55 56 57 58 59

62 63 64 65 66 67 68

69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92

In recent years the sciences have witnessed a fundamental change in attitude to research data management. This is evidenced by various phenomena, including the widespread adoption of data management policies; the ever increasing posts for data specialists; and innovations in data publishing, all of which is indicative of data management becoming integral to the mainstream research process. 1.2.1. Data management policies A 2010 report by the Publishing Research Consortium (PRC) on importance versus ease of access to scientific outputs indicated that irrespective of discipline, geographic location, organization type, and respondent demographics, data sets are regularly identified as having high value but poor accessibility [6]. Such findings played a role in convincing funding agencies to develop policies for research data management [7], so that today there is a global trend towards more responsible research data management practices. For example, all UK Research Councils have adopted data management policies [8]. In combination with support from agencies providing the necessary technical support, such as the Digital Curation Centre (DCC) and the Joint Information Systems Committee (JISC), this circumstance has led to many higher education institutes in the UK now stipulating data management policies [9]. Similarly at the European level, the European Commission is committed to the more effective use of data generated through publicly funded activities [10]. In the domain of European research, this is manifest in the Horizon 2020 Open Data policy [11]. Beyond Europe, the global perspective was set over a decade ago by the OECD, which at the request of Science and Technology Ministers established guidelines to facilitate cost-effective access to digital research data from public funding [12]. More recently, this perspective has been reiterated by the G8 Science Ministers in a statement that makes explicit

reference to the fact that open scientific research data should be easily discoverable, accessible, assessable, intelligible, useable, and wherever possible interoperable to specific quality standards [13]. In the context of this widespread recognition of the inherent worth of research data and its potential for reuse, the Research Data Alliance (RDA) was established in March 2013 with the support of the European Commission, the US Administration and Australian Government. The purpose of the RDA is to accelerate international data-driven innovation and discovery by facilitating research data sharing and exchange, use and re-use, standards harmonization, and discoverability. To achieve these objectives, the RDA is focused on the development and adoption of infrastructure, policy, practice, and standards. In the domain of engineering materials, these objectives are being pursued by the recently established RDA/CODATA Materials Data, Infrastructure & Interoperability interest group (MDII IG). While the MDII IG is one of many RDA interest groups tackling discipline-specific issues, there are also a large number of cross-cutting interest groups, several of which are concerned with different aspects of data publishing. 1.2.2. Trends in data publishing The emergence of data journals and publisher policies on data management are suggestive of a paradigm change in the way publishers perceive data and their place in the publication process. While some publishing houses host their own repositories, a key development in data publishing came with the establishment in 2009 of the DataCite Consortium and its framework for data citation based on digital object identifiers (DOIs). This framework allows any data centre that meets DataCite quality requirements to enable the data sets it hosts for citation. This circumstance provides researchers with a wide choice of potential hosts for their data. As shown in Fig. 1, a query of the form http://search.datacite. org/api?q=*&rows=0&facet=on&facet.range=minted&facet.range. gap=%2B1YEAR&facet.range.start=2009-01-01T00:00:00Z&facet. range.end=2016-01-01T00:00:00Z indicates that of the more than six million DataCite DOIs minted since 2009, over a third have been registered in 2015. Another example of innovation in data publishing is Data In Brief, an Elsevier journal that provides the opportunity to publish a comprehensive description of data sets that is often not feasible in Section 2 of a traditional scientific publication. As shown in Fig. 2, Data In Brief statistics posted at http://www.sciencedirect. com/science/journal/23523409 indicate an exponential rise in data articles over the six quarters since the journal came into existence, with approximately 300 publications expected in 2016 Q1. Such growth in data publication and citation is entirely consistent with the results of the PRC study on importance versus ease of access to scientific outputs, the findings of which indicated the importance to researchers of improved access to research data. 1.3. Technologies for engineering materials data Engineering materials data covers a broad spectrum of data types, including mechanical properties; physical properties; chemical properties; microstructures; phase properties; kinetics; and thermodynamic properties [15]. Recent initiatives such as the Materials Genome Initiative (MGI) and integrated computational materials engineering (ICME) have given new momentum to the development of technologies for engineering materials data. With their significant dependence on ICT, both the MGI and ICME are presently the key drivers for technologies for engineering materials data. In the scope of the Materials Innovation Initiative (MII) of the US Administration, the MGI aims to reduce the time-to-market of newly developed materials by half. The MGI relies on a better integration of related activities, including experiment, computation,

Please cite this article in press as: T. Austin, Towards a digital infrastructure for engineering materials data, Mater. Discov. (2016), http://dx.doi.org/10.1016/j.md.2015.12.003

93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111

112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140

141

142 143 144 145 146 147 148 149 150 151 152 153 154 155

G Model MD 8 1–12

ARTICLE IN PRESS T. Austin / Materials Discovery xxx (2016) xxx–xxx

3

Fig. 1. DataCite statistics (as of December, 2015).

Fig. 2. Data In Brief publications (as of December, 2015).

156 157 158 159 160 161 162 163 164 165 166

and theory to facilitate more effective materials modelling. To this end, the requirement for an infrastructure for materials data is stated explicitly in the MGI Strategic Plan, which requires best practices for implementation of a materials data infrastructure to be identified, leading to a materials data infrastructure that combines the software, hardware, and community-wide standards to allow discovery, access, and use of materials data [16]. In this context, developing and deploying standards for materials data has consistently been identified as a priority action [17,18]. ICME is a newly emerging discipline in large part concerned with physicsbased modelling of materials across all length scales. As with the

MGI, technologies for materials data are a pre-requisite for ICME [19,20]. While the MGI represents the first concerted effort at a political level to establish a data infrastructure in support of streamlined materials development, there have been prior efforts at industry level by ASTM and ISO. In the domain of materials properties and materials qualification, ASTM E49 was concerned with the computerization and networking of materials property data [21,22]. At ISO, TC 184/SC 4 develops and maintains a multi-part standard for product data representation and exchange, two parts of which deal primarily with the description of materials, namely ISO 10303

Please cite this article in press as: T. Austin, Towards a digital infrastructure for engineering materials data, Mater. Discov. (2016), http://dx.doi.org/10.1016/j.md.2015.12.003

167 168 169 170 171 172 173 174 175 176 177

G Model MD 8 1–12

T. Austin / Materials Discovery xxx (2016) xxx–xxx

4 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212

213

ARTICLE IN PRESS

Part 45 (Integrated generic resource: Material and other engineering properties) and ISO 10303 Part 235 (Application protocol: Engineering properties for product design and verification). The ISO technologies find application primarily in the aerospace and process industries [23]. Beyond ASTM and ISO, an industry-led initiative to develop a materials markup language (MatML) took place under the auspices of OASIS-Open [24] but eventually failed to be ratified. Despite the efforts of ASTM, ISO, and OASIS-Open, technologies for engineering materials data have met with limited success. This is in contrast to ASTM and ISO materials testing and product standards, which have been developed and maintained since the time of the industrial revolution and are at the very foundation of the engineering materials industry [25]. Many thousands of standards exist for engineering materials testing and products, thereby assuring reproducibility of materials supplied by different manufacturers. The very real anomaly is that the efforts by standards organizations to develop technologies for engineering materials data have not been more closely integrated with the testing and product standards. It is this circumstance that motivated CEN/WS ELSSI-EMD, a CEN Workshop on the topic of the economics and logistics of standards-compliant schemas and ontologies for engineering materials [26]. CEN/WS ELSSI-EMD demonstrated the viability of deriving formats for engineering materials tests data from a procedural testing standard, namely the ISO 6892 Part 1 standard for ambient temperature tensile testing. The early work on formats based on testing standards then continued in the frame of CEN/WS SERES, a CEN Workshop on standards for electronic reporting in the engineering sector that investigated the use of product standards as a basis for developing formats for materials pedigree data [27]. The development, adoption, and use of these data formats in conjunction with DataCite DOIs as a means to facilitate discovery, sharing, and federation of heterogeneous collections of engineering materials data is the subject of this paper. 2. Material and methods

223

The work to demonstrate the efficacy of standards-compliant data formats in combination with data citation as a means to facilitate the discovery, transfer, and interoperability of engineering materials data has been undertaken using MatDB, the materials database application of the European Commission Joint Research Centre hosted at https://odin.jrc.ec.europa.eu. The work has involved extending the database application with a view to integration with testing facilities, modelling tools, and other database applications and enabling the application to mint DataCite DOIs.

224

2.1. Systems integration

214 215 216 217 218 219 220 221 222

232

Typically, systems integration and data interoperability are reliant on mapping local data structures. In the case of MatDB, this has been achieved by extending the database application with modules that process data compliant with the CEN/WS ELSSI-EMD technical specifications [28]. These modules are implemented in Java and rely on the simple API (application programming interface) for XML (SAX) parser and RESTful Web Services to expose and consume data.

233

2.2. MatDB data citation module

225 226 227 228 229 230 231

234 235 236 237

MatDB support for data citation is automated and relies on a DataCite service hosted by TU Delft Library [29], the Dutch DataCite member. As is typically the case for DataCite member organizations, TU Delft hosts a DOI minting service built on the DataCite API [30].

This is a RESTful API that relies on HTTP (hypertext transfer protocol) requests to methods that create a DOI for a specified data set. Client services, such as the MatDB data citation module, post HTTP requests to the DataCite endpoint, which then mints the DOI and returns the request status. The structure of the HTTP request is such that as well as calls to particular methods, the request includes bibliographic data that are compliant with the DataCite metadata schema [31]. Before a request to mint a DOI can be issued, the MatDB data citation workflow requires that the data set meets certain mandatory requirements in respect of source, material, specimen, test condition, and test result data. Once the discipline-specific requirements are satisfied, the mandatory bibliographic requirements specified by the DataCite metadata schema are met through combination of discipline-specific data (for the citation title and abstract for example) and end-user input (such as in the case of the license). Having met these mandatory requirements, a DOI request can be issued. At this stage, an XML snapshot is created of the collection of discipline-specific and bibliographic data. This is necessary to satisfy the requirement that a DOI resolves to a static data set, which is necessary to guarantee that the state of the data set remains unchanged with respect to the report or publication from which the data set is referenced. To accommodate a changing data set, the MatDB data citation service relies on versioning as per the relevant annotation to the DataCite metadata schema, so that a single DOI in combination with the DataCite version property can accommodate a change in state of a data set.

3. Results The work to enable MatDB for standards-compliant data formats and data citation has so far yielded two systems integrations and a data citation service that has resulted in the publication of more than 5000 data sets (approximately 25% of the total content of the MatDB database).

3.1. Systems integration Fig. 3 shows the MatDB XML console. It is intended to replicate the console of a tensile test machine. Although not all the data associated with a materials test would be entered in such a console, the larger part of the testing conditions, the specimen dimensions, and test results would be present, most likely entered as a configuration file. The key difference between Fig. 3 console and that of an actual test facility is the data centre settings group, which specifies the connection to the remote database. As shown in Fig. 4, after starting the simulated tensile test, the stress versus strain curve is displayed both graphically and numerically. The format of the latter is compliant with the ISO 6892 Part 1 technical specification [28]. Once the simulated test is complete, the data are transferred directly to the MatDB content area associated with the credentials entered in the data centre settings group. The ISO 6892 Part 1 technical specification [28] has also enabled integration with the Gen IV Materials Handbook hosted at the Oak Ridge National Laboratory (ORNL) [32]. The integration relies on export and import modules that are capable of generating and consuming data formatted in accordance with the technical specification. The availability of the documentary standard on which the technical specification was based allowed the two database applications to be integrated with minimal consultation between the two development teams.

Please cite this article in press as: T. Austin, Towards a digital infrastructure for engineering materials data, Mater. Discov. (2016), http://dx.doi.org/10.1016/j.md.2015.12.003

238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264

265

266 267 268 269 270

271

272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296

G Model MD 8 1–12

ARTICLE IN PRESS T. Austin / Materials Discovery xxx (2016) xxx–xxx

5

Fig. 3. The test conditions settings group of the MatDB XML console.

Fig. 4. Simulated curve data displayed in the MatDB XML console.

297

298 299 300 301 302 303 304 305 306 307 308

3.2. Citable data sets

4. Discussion

309

MatDB hosts a module that routinely allows data sets to be enabled for data citation using DataCite DOIs. As already mentioned, more than 5000 DOIs have already been minted, allowing the corresponding data sets to be referenced in derivative works in exactly the same way that traditional publications are referenced. As shown in Fig. 5, the DOI resolves to a landing page with bibliographic data and a link to the actual data set. In the figure, the citation format complies with the DataCite recommendation [31] and includes the authors of the data set, the year of publication, the title of the data set, the version, the publisher, the type of data, and the DOI.

4.1. Research data

310

Traditionally, the scientific community has published research data primarily in the form of tables and charts in scientific publications. In this circumstance, a large part of the metadata not relevant to the publication goes unreported, thereby reducing the possibility to reuse the data in derivative works. Further, the data are in a printed form that is not best suited for reuse. It is perhaps for reasons such as these that the PRC survey of researchers undertaken in 2010 identified the limited access to scientific data as an issue [6]. As shown in Fig. 6, which presents the overall results

Please cite this article in press as: T. Austin, Towards a digital infrastructure for engineering materials data, Mater. Discov. (2016), http://dx.doi.org/10.1016/j.md.2015.12.003

311 312 313 314 315 316 317 318 319

G Model MD 8 1–12 6

ARTICLE IN PRESS T. Austin / Materials Discovery xxx (2016) xxx–xxx

Fig. 5. MatDB DOI landing page.

Fig. 6. Overall results on the PRC survey of importance versus access to research. Courtesy of PRC [6].

320 321 322 323 324 325

of the survey of 3823 researchers, access to scientific data was consistently identified as problematic irrespective of region, organization, or discipline, thus suggestive of an inherent shortcoming in the research process. For anyone working as a data management professional, this shortcoming was self-evident and manifest as a disregard for data beyond the narrow context in which they were

created, generally resulting in data being discarded beyond their immediate intended use. The requirement for more responsible research data management practices mandated by funding agencies, together with actions by public authorities to place administrative data in the public domain; initiatives to enable data citation and data

Please cite this article in press as: T. Austin, Towards a digital infrastructure for engineering materials data, Mater. Discov. (2016), http://dx.doi.org/10.1016/j.md.2015.12.003

326 327 328 329 330 331

G Model MD 8 1–12

ARTICLE IN PRESS T. Austin / Materials Discovery xxx (2016) xxx–xxx

332 333 334 335 336

337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363

364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395

publication; the Open Data movement; recognition of research data as a valuable resource [33]; and the success of Open Access (to traditional scientific publications), has led to a change in culture that seems set to establish data management as an integral part of the mainstream research process. 4.1.1. Open Access It can reasonably be argued that Open Access represents a change to the scientific business model comparable to changes that the web and related technologies have had on various business domains, including media, telecoms, and finance, where films are streamed online, calls are routinely made using VOIP (voice over IP), and bank accounts and transactions are typically selfserviced. In the mentioned business sectors, the web has acted as a disruptive technology, whereby short-term disruption of an existing business process has led to more streamlined and innovative business practices. Similarly, the web has enabled entirely new business models, such as those of eBay and Amazon, where brickand-mortar has given way to click-and-order. These new ways of conducting business have been made possible as a consequence of the unprecedented connectivity that the web offers. With respect to Open Access, this connectivity has broken the monopoly on content delivery that previously allowed publishing houses to dictate the terms of scientific publishing. A further parallel can be drawn between Open Access and the changed business models in other sectors, insofar as a balance is typically achieved between the old business models and the new. As in other sectors, where established players have yielded to market pressure and now deliver a broader range of (internet) services, publishing houses have adopted Open Access but on terms that recognize the added-value they deliver. Thus, Gold Open Access has emerged as a means by which publishing houses charge a fee for allowing a scientific publication to be openly accessible. 4.1.2. Open Data While Open Data builds on the Open Access paradigm, taking advantage of the unprecedented connectivity that the web offers, its realization is a challenge. With the first scientific journal dating to the 17th century, the reason is simply that the infrastructure and workflows developed for traditional scientific publishing are largely missing in the case of research data. In terms of infrastructure, a sufficiently wide range of discipline-specific and generic research data repositories and the technologies for transferring research data are not readily available. Similarly for workflows, the procedures and best practices for curating and sharing data are not yet integrated into mainstream research. There is clear evidence though of a concerted effort to address such shortcomings. For example, the Data Seal of Approval (http:// datasealofapproval.org) provides certification for data repositories, aiming at sustainability and trust; Zenodo (https://zenodo.org) and figshare (http://figshare.com) are generic repositories for making research outputs, including data, available in a citable, shareable and discoverable manner; DataCite allows individual data sets to be cited and resolved; the Thomson Reuters Data Citation Index (DCI) allows data citation metrics to be monitored; and the Digital Curation Centre (DCC) publishes various data management guides and resources. In which case, it is apparent that the infrastructure and workflows needed for Open Data are coming into existence. However, whether Open Data offers the most effective means for sharing research data is open to question. Open data sets are expected to be available without access restrictions and although this may be suitable for public data that have no inherent value (whereas the value is derived through added-value services), the inherent complexity and value of scientific data may better be served by requiring access to be granted by the data owner. In this circumstance, the dialogue between the data owner and the

7

party interested to reuse the data has the potential to be of mutual benefit, whereby the data owner can negotiate the terms they consider appropriate to their own interests, while the re-user can gain deeper insights into the data. Further, the shared interests that bring the two parties together have the potential to lead to cooperations on research of common interest. Data citation fosters just such a data sharing paradigm. 4.2. Data citation and publication Data citation is the practice of enabling data to be cited in the same way as traditional scientific publications and is finding increasing use in the sciences due to its many benefits. Data citation enables reproducibility and transparency (data sources for e.g. impact assessment studies are made available); provides a much needed incentive to preserve and share data by ensuring attribution and recognition (in the same way as traditional citations); enables discovery, thereby encouraging reuse, as well as guarding against redundancy; and provides a mechanism to expose data sets without necessarily making those data sets immediately accessible, which is important where data sets have intrinsic commercial value. For a data set that has been enabled for citation using a DataCite DOI, the citation is equivalent to that of a traditional publication insofar as the data set can be cited in the references section of a report or article using a standard format that includes the data creators (people and/or organization), a title, data of publication. As an example, the following a recent nuclear engineering publication [34]: [32] M. Bruchhausen, F. de Haan, Test data for low cycle fatigue on material 14Cr 1W ODS at 650 ◦ C and 750 ◦ C, JRC Petten, 2013.http://doi.org/10.5290/1000021 to http://doi.org/0.5290/1000033 inclusive, v1.0, [data set]. As is typically the case, in this citation the DOI links resolve to landing pages that display bibliographic data (metadata) about the data sets, including an abstract. The landing pages then provide a link to the data sets, which may either be Open Access or restricted access, the latter requiring some form of authentication or explicit permission for the data to become accessible. Data citation thus ensures that the creator of a shared data set is acknowledged in derivative works and as such provides the longneeded motivation for researchers to share their data. While the technology on which DataCite DOIs depend, namely the Handle System, has been available and found application for some time, such as by the DSpace repository application, it is DataCite that has captured the imagination of the scientific community. This is perhaps because DataCite does not simply deliver a technology for assigning permanent identifiers to data sets. Instead, it supports DOI minting and resolution services; maintains a metadata schema that ensures data sets are properly annotated; and pursues alliances with other service providers, such as Thomson Reuters DCI and re3data (http://re3data.org). More importantly, the credibility of DataCite is ensured by its membership, which comprises a group of not-for-profit organizations many of which are wellrespected research libraries and government agencies. DataCite members provide local representation and support for researchers and enable data owners, stewards, or archives to assign DOIs to research data, so that end users of DataCite can be confident that the services will be sustained over the longer term. The potential impact of data citation extends far beyond simply ensuring the creator of a shared data set is acknowledged. By making data sets discoverable, data citation can reduce redundancy, i.e. the unnecessary repetition of tests simply because the existence of data is not known. Further, data citation allows data sets to be

Please cite this article in press as: T. Austin, Towards a digital infrastructure for engineering materials data, Mater. Discov. (2016), http://dx.doi.org/10.1016/j.md.2015.12.003

396 397 398 399 400 401 402

403

404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458

G Model MD 8 1–12 8 459 460 461

462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523

ARTICLE IN PRESS T. Austin / Materials Discovery xxx (2016) xxx–xxx

cited irrespective of their access level (Open Data remain open and restricted data remain restricted) and provides a basis for aggregating data from different sources. 4.2.1. Data discovery Data conservation is often justified simply on the basis of the investment in the creation of data. In the engineering materials sector it seems reasonable to argue that the data from a 10-year creep test or from any thermo-mechanical fatigue test should be conserved simply because of the expense involved in undertaking the test. However, if the data cannot be discovered, they cannot be reused, with the result that a silo is created that is impressive only in terms of its volume and the financial investment it represents, rather than the extent to which it contributes to future research and development. Data citation allows data to be discovered because the data set is accompanied by bibliographic information that describes the data (in the title and abstract, for example). Importantly, as with traditional publications, this potential for discovery of a data set does not affect the access level of the data-open data remain open, while restricted data remain restricted. While advocates of Open Data may argue that there is no value in discovering a restricted data set, it should be understood that where there is inherent value there is a case for restricting access. This is what distinguishes scientific data from public sector data. The latter is statistical information with little or no inherent value but with potential for derived value. For example, a company could create an app for identifying parking violation hot spots based on parking fine statistics (no inherent value) so that drivers could make an informed decision about where to park, with the app developer likely to benefit from advertising revenue (derived value) from automotive companies. Scientific data are different insofar as it always has inherent worth, either intellectual or commercial, and it would be naïve to expect such data always to be made openly available. Data citation allows for dialogue with the owner of a restricted data collection, with the possibility that the prospective reuser of the data offers or accepts terms of reuse that are of greater worth than the inherent value of the data. For example, a research group may be interested to make use of a restricted data collection to publish a high impact paper on which they offer to list the data owner as a coauthor. Alternatively, a company that has developed an advanced material for a specific application could negotiate with a competitor a license that has a higher commercial value. Data citation thus has the potential for greater impact than Open Data because it facilitates access to data collections of high inherent worth and encourages co-operations that are mutually beneficial to both the data reuser and the data owner. On this basis it can be expected that data citation will address any concerns that individual researchers or companies have traditionally had about publishing their data. The increasing numbers of citable data sets and data publications presented in Figs. 1 and 2 are suggestive that data citation is proving to be an effective driver for data sharing. For the engineering materials community, trends in the use of MatDB provide anecdotal evidence of a transition to a data sharing culture. MatDB delivers data management services primarily to the European research community and typically data sets have been restricted access, i.e. the typical data silo scenario. In this circumstance, while the service MatDB adds value within the scope of a specific project, by allowing partners to collect, exchange, and analyse a shared data collection, the value beyond the term of the project is limited. With the introduction of a data citation service, this pattern of behaviour appears to be changing. In the case of the MATTER Euratom project on materials testing and rules [37], more than 300 validated data sets for P91 and AISI 316 steels were uploaded by the project partners, half from industry and half research centres. With the partners choosing to enable almost 90%

of the data sets having for data citation, the accompanying bibliographic records are publicly available from the DOI landing pages. In which case, although access to the data sets is restricted to MATTER project partners, they can be discovered and hence there is potential for reuse. More recently and largely in consequence of the benefits of the MatDB data citation service, partners in the INCEFAPLUS Euratom project on increasing safety in nuclear power plants by covering gaps in environmental fatigue assessment [38] have chosen to participate in the Horizon 2020 Open Data Pilot. 4.2.2. Data aggregation In any discipline, there is considerable incentive to aggregate data from distributed, heterogeneous, and fragmented information systems. This though represents a challenge because related data sets first need to be discovered before being aggregated. Defining technologies that allow the precise description of discipline-specific data sets provides one solution, whereby the data sets can be discovered and aggregated based on the precisely defined metadata. The difficulty of such an approach is that consensus has to be achieved amongst the larger proportion of the stakeholders in the specific discipline if a viable technology is to be realized. This is a time-consuming and high risk process. Even where viable technologies are developed, large scale adoption may be absent because professionals in a specific discipline are not sufficiently ICT literate to make immediate use of the technologies. As an alternative, the bibliographic data that accompany a data set assigned a DataCite DOI contains sufficient technical information to allow a preliminary aggregation of data coming from different repositories. Although there will certainly be additional work to create the aggregated data object, including verification of relevance and format conversion, data citation facilitates an iterative approach to data aggregation that offers improved prospects compared to solutions based on the development of disciplinespecific technologies. 4.2.3. Managing change A DataCite DOI enables a data set to be cited in a publication in a reliable and sustainable way. This reliability relates both to the state of the data set as well as its availability. The state of the data set is important in terms of reproducing results, either in the circumstance of an effort to reproduce the results or in the circumstance of reusing data for derivative works. In either case, once cited, the state of a data set must not change. Where database resident data sets are enabled for citation, managing the changing state of the data set presents some challenges. This is due to the create, read, update, and delete (CRUD) actions inherent to a relational database system, whereby database records will be subject to change. The MatDB data citation service addresses the changing nature of a database record by serializing the data set at the time the DOI is requested. The DOI thus resolves to a snapshot of a database record rather than directly to the database record itself. In the circumstance that a data set changes, common practice is to mint a new DOI. While this may be convenient, it introduces issues, not least of which is an unnecessary proliferation of DOIs. The DataCite metadata schema documentation [31] refers to the guidelines of the Earth Science Information Partners [35], where the recommendation is to assign unique locators to major versions. However, what constitutes a major version change is ambiguous. For MatDB, the approach is instead guided by object-oriented principles, where a data set is considered as a data object, which in turn is a concrete instance of a data class. With the creation of a data set, a data object comes into existence and can be assigned a DOI, which is the identifier of the digital object. If the data object changes, for example because of a change in the period over which the properties of the data object have been measured or simply because of corrections, this represents a change in state of the object and

Please cite this article in press as: T. Austin, Towards a digital infrastructure for engineering materials data, Mater. Discov. (2016), http://dx.doi.org/10.1016/j.md.2015.12.003

524 525 526 527 528 529 530 531 532

533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556

557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587

G Model MD 8 1–12

ARTICLE IN PRESS T. Austin / Materials Discovery xxx (2016) xxx–xxx

588 589 590 591 592 593 594 595 596 597 598 599

not a new object. In this circumstance, there is no basis for minting a new DOI because a new data object has not been created. As in the case of a VCS (version control system), in this circumstance MatDB instead assigns a new version. This approach provides a concrete basis for deciding when to create a DOI rather than rely on the vagaries of individual repository guidelines or policies. While this may not be considered particularly important, if data citation becomes established practice, an accompanying proliferation of DOIs has the potential to become an issue, whereby tracking the relationship between a collection of DOIs that represent the same data set will complicate and detract from the added-value of data citation.

619

4.2.4. Risks Innovations in data publishing introduce risks alongside the benefits. Data citation not only allows for the creators of data sets to be acknowledged, the increased exposure allows for closer scrutiny and criticism. While it can be argued that traditional scientific publishing has flourished for many centuries under such constraints, increased exposure of data sets may prove more damaging given that criticism of concrete numeric data and procedures may prove more difficult to defend than criticism of more abstract concepts and theories. In which case, although increased transparency can only serve to improve science, researchers must be properly informed of both the risks and benefits of data publication. More problematic is the potential for data proliferation that increased exposure of data sets will promote. The same data set can very easily become replicated in different database systems and at some stage be released into the public domain in the guise of different data objects. Under these circumstances, there will be an impact on models derived from the data, whereby the replicated data sets will introduce a false statistical weighting. To mitigate this risk, some means of digital watermarking for data sets is required.

620

4.3. Data reuse

600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618

621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650

Much as data conservation is only worthwhile if data can be discovered, data sharing is only of value if the data are reused. Reuse can take many forms, such as validating a model, developing a material tailored to a specific environment, material forensics in the case of component failure, etc. It is though the untapped potential for reuse in different contexts that perhaps offers the greatest potential. Online market places, such as eBay, Gumtree, or Marktplaats, demonstrate the potential for reuse of physical objects in contexts that the vendors could never imagine. Similarly, it should be anticipated that the huge connectivity that the web offers will allow data sets to find reuse in contexts beyond that for which they were originally created. While it is impossible to anticipate what could be the benefits, science history shows that connections between apparently unrelated technologies are a characteristic of technological development [35]. To benefit from the anticipated and unknown potential of data reuse, the immediate challenge is to develop services that promote confidence and trust in shared data sets. Data quality has a significant role to play in this respect. MatDB addresses data quality by placing mandatory requirements on materials pedigree and test result data; by compliance with recognized materials testing standards; and incorporation of a validation step into the data entry workflow. However, even with such measures in place, experience is that end users either use the data in combination with other data sources as a means to validate already developed models or in combination with related resources, such as the original reports. Both these cases suggest that further measures are required to promote confidence in the data. Historically, peer review has been the method by which confidence and trust is established in the sciences. However, traditional

9

peer review is simply not scalable to data sets and so there is a pressing requirement for innovative and viable alternatives for establishing data quality and fitness for purpose. In the case of MatDB, consideration is being given to an open peer review mechanism that borrows from the proven methodology of dotcoms such as eBay and Amazon, whereby the quality of services offered and accompanying transactions are assessed through a process reliant on end user feedback. 4.4. Standards for engineering materials data Irrespective of the method of discovery or the means of data publication, efficient transfer and reuse of data eventually relies on standard, discipline-specific technologies. Many examples of such data formats exist, of which those that have enabled the web itself to flourish are a classic example, including HTTP, HTML, CSS, and XML. In the materials sector, technologies that have gained a significant degree of traction include ThermoML [39] and CML [40]. In the engineering materials sector, while examples of technical specifications exist, including MatML and ISO 10303 Parts 45 and 235, their adoption has been limited to a few isolated cases, with the result that maintenance and development of the specifications has proven problematic and data transfer is an issue that remains to be addressed. In the industrial sector, the lack of any established standard for transferring engineering materials data leads to fragmentation of the data transfer process (between information systems, departments, and organizations) with adverse consequences for materials development and manufacture. Similarly in the research sector, data collection and analysis are more time consuming than need be. It is not unreasonable to suggest that this circumstance also has a bearing on the current trends in data publication as evidenced by Data In Brief (at http://www.sciencedirect.com/science/journal/ 23523409). Given the significant number of scientific publications that include materials data, if engineering materials data were easily stored and exchanged, it could be anticipated that there would be a significant number of Data In Brief articles. However, as shown in Fig. 7, relatively few data articles are published in the journal’s engineering category. It is in this context that a series of CEN Workshops on standards for engineering materials data have taken place. The methodology for developing the technical specifications for engineering materials test data was established in CEN/WS ELSSI-EMD and documented in CWA 16200:2010 [26]. This methodology relies on a close examination of a specific testing standard with a view to creating a machine readable implementation. The process relies on modelling the standard and creating a technology-specific implementation, a key aspect of which is to adhere to the terminology specified in the testing standard. 4.4.1. CEN Workshops on standards for engineering materials Given that engineering materials are used in all industrial sectors, from the manufacture of nanoscale devices through to superstructures, standards for engineering materials data have the potential for widespread application. While this anticipates a very significant impact across the industrial sector, it also requires that the needs of the various stakeholders are accommodated. To this end, two CEN Workshops have taken place on standards for engineering materials. CEN Workshops allow prenormative research in a specific domain to be undertaken by a small project team, whose work in turn is overseen by registered participants (typically organizations with a vested interested in the domain of the CEN Workshop). As a registered participant in a CEN Workshop, an organization can determine its level of commitment, ranging from simply reviewing the interim report and CEN Workshop Agreement (CWA),

Please cite this article in press as: T. Austin, Towards a digital infrastructure for engineering materials data, Mater. Discov. (2016), http://dx.doi.org/10.1016/j.md.2015.12.003

651 652 653 654 655 656 657 658

659

660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697

698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713

G Model MD 8 1–12 10

ARTICLE IN PRESS T. Austin / Materials Discovery xxx (2016) xxx–xxx

Fig. 7. Data In Brief publications by category (as of December, 2015).

714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750

to participation at Workshop plenaries, to contributing knowledge/experience to the work of the project team. A CEN Workshop typically lasts one or two years, at the end of which a CWA is delivered that reports on the work and the results. This document is expected either to evolve into or contribute to a normative standard and for this reason has a finite lifetime of between 3 and 6 years. CEN/WS ELSSI-EMD was the first CEN Workshop on standards for engineering materials, the second was CEN/WS SERES. CEN/WS ELSSI-EMD delivered technical specifications for uniaxial tensile data compliant with ISO 6892 Part 1 [26], while CEN/WS SERES delivered technical specifications for materials pedigree data in part based on the work of ASTM E49 [27]. Both CEN/WS ELSSIEMD and CEN/WS SERES pursued three strands of activity, namely the business case, governance, and development of the technical specifications. The business case examined the interest of the business community in technologies for engineering materials data and the impact for specific business processes. During CEN/WS ELSSI-EMD, a survey of stakeholders was undertaken that identified a significant interest in such technologies coupled with concerns about scalability to other test types and a need to harmonize existing solutions. During CEN/WS SERES, case studies were undertaken at three industrial organizations, two in the aerospace sector and one in the power systems sector. These case studies involved a critical examination of the materials delivery and qualification processes, identifying activities that would benefit from efficiency gains with the adoption of electronic data transfer and reporting. The governance strand examined the role of the standards organizations in the longer term development and maintenance of the technologies. During CEN/WS ELSSI-EMD, the role of the established standards organizations in publishing electronic standards was examined. With ISO already hosting such standards [42], CEN introduced a similar service at http://uri.cen.eu/cwas. During CEN/WS SERES, consideration was given to potential custodians of the technical specifications for engineering materials data. With candidate technical committees either being concerned with testing and product specifications (ECISS/COCOR) or automation

systems (CEN/TC 310), a suitable host could not be identified. There was however an interest to pursue the matter further, so that the concerned technical committees continue to be engaged. The technology strand of CEN/WS ELSSI-EMD was concerned with demonstrating the viability of deriving data formats from a documentary standard for materials testing, namely the ISO 6892 Part 1 standard for ambient temperature tensile testing. This proved successful, with the structure of the document; the vocabulary and notation clauses; and the boundary conditions and rules specified in the body of the documentary standard, all serving to allow a data format to be derived. With materials test data only having meaning in the context of materials pedigree data and with the harmonization of existing technologies in mind, CEN/WS SERES examined the viability of developing a unified model for materials pedigree data from existing specifications, namely MatML and ISO 10303; the prior work of ASTM E49; and documentary product standards. While the work delivered a high-level model for an engineering material, its further development will depend on the rationalization of the relevant vocabulary standards.

4.4.2. Adoption CEN Workshops were chosen as the platform for the development of technologies for engineering materials data specifically precisely because prior efforts have failed to find widespread adoption. Standards are intended to enable innovation and thus engagement with standards organizations is synonymous with industrial engagement. Although it cannot be claimed that industry has chosen to adopt the CEN/WS ELSSI-EMD and CEN/WS SERES technologies, it is the case that there is a better understanding of the need for such technologies and of their potential benefits, so that the industrial community can make informed decisions about how best to proceed. While there are concerns, including the potential disruption of existing business process and the additional burden for already over-worked technical committees, a dialogue with stakeholders has been established that will ensure that the concerns and reservations can be taken into consideration.

Please cite this article in press as: T. Austin, Towards a digital infrastructure for engineering materials data, Mater. Discov. (2016), http://dx.doi.org/10.1016/j.md.2015.12.003

751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769

770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785

G Model MD 8 1–12

ARTICLE IN PRESS T. Austin / Materials Discovery xxx (2016) xxx–xxx

786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816

817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835

836

837 838 839 840 841 842 843 844 845 846 847

Demonstrating the technologies will prove equally important as stakeholder engagement in encouraging adoption of technologies for engineering materials data. To date, the proof-of-concepts involving MatDB have demonstrated that technologies derived from documentary standards for materials testing are fit for purpose. For example, with systems integration typically involving collaboration between different software development teams working at different geographic locations, effective communication is a key factor in the success of a systems integration project. In this context, when integrating materials systems, establishing a common understanding of the fields or properties to which data correspond typically involves communication between materials experts and software developers from all teams. This process is both error-prone and time-consuming. In the scope of the integration of the Gen IV Handbook and MatDB, standards-compliant data formats provided a very effective solution to this issue, whereby the documentary standard provided a common vocabulary available to the different development teams [32]. Further, where standardization organizations may have concerns about freely available technical specifications adversely affecting the sales of their documentary counterparts, the integration of the Gen IV Handbook and MatDB demonstrated that the technologies and the documentary standards are complementary, so that use of the technologies will promote use (and hence procurement) of the corresponding documentary standards. With the CEN Workshop technologies for engineering materials data having been available for only a relatively short time, the opportunities to demonstrate the technologies have been limited. With related CEN Workshops set to continue and with more planned, new opportunities to demonstrate the technologies will help promote their adoption. 4.4.3. Structural materials in the nuclear energy sector With the demands for materials capable of withstanding more extreme conditions associated with new reactor concepts and with consideration being given to extending the operational lifetimes of current reactors to 60 or even 80 years, there is a pressing need for technologies that enable data capture and exchange. The CEN Workshops on standards for engineering materials data are thus motivated in the first instance by the need for efficient storage and transfer of nuclear materials information [3]. In the context of structural materials for nuclear energy applications, the streamlined transfer of engineering materials data will ensure easier access to information, greater transparency of information, improved analysis of data, aggregation of distributed data, and more effective systems integration [3], all of which will serve to improve nuclear safety and thereby confidence in nuclear energy as a component of the nuclear energy mix. The continued interest in the nuclear energy sector to undertake prenormative research on technologies for engineering materials is evidenced by a third CEN Workshop on the topic of data formats for fatigue data [41]. 5. Conclusions Compelling reasons have been presented to suggest that research data management will become an integral part of the mainstream research process. An important component of the changing culture is data citation. Data citation seems set to change science fundamentally, in large part due to the bibliographic metadata that accompany a published data set, which not only ensure the data creator is accredited for their work but also enable discovery and aggregation of data sets. In the engineering materials sector, much-needed technologies for data are the focus of an ongoing series of CEN Workshops. In an effort to engage with stakeholders, these technologies are closely

11

coupled with the extensive collection of documentary standards that provide the foundation for materials testing, qualification, and production. While much work remains, the technical specifications delivered to date have been shown to be fit for purpose. The future for shared research data is very promising. The DataCite metadata schema offers possibilities for data aggregation and the development of machine interfaces. In combination with the emerging CEN specifications, such technologies have the potential to deliver an infrastructure for engineering materials data. Uncited references [14,36]. Acknowledgements The contributions from Sebastian Peters (DataCite), Gian Franco Casula (European Research Council), and Allister Smith (software consultant) are gratefully acknowledged. For the CEN Workshop on standards for engineering materials data, credit is due to the members of Project Teams, namely Alexandre Faget (Granta Design), Aurélie Virgili (Agathis Consulting), David Leal (Caesar Systems Ltd), John Rumble (R&R Data Services), Malcolm Loveday (Beta Technology Ltd), and to their Chair, Chris Bullough (GE Power). References [1] N. Swindells, The representation and exchange of materials and other engineering properties, Data Sci. J. 8 (2009) 190–200, Available from: https://www. jstage.jst.go.jp/article/dsj/8/0/8 008-007/ pdf (retrieved 5.11.15). [2] RIN and the British Library, Patterns of information use and exchange: case studies of researchers in the life sciences, Research Information Network, 2009, Available from: http://www.rin.ac.uk/system/files/attachments/ Patterns information use-REPORT Nov09.pdf (retrieved 5.11.15). [3] COM(2013) 561 final. The annual Union work programme for European standardization. Available from: http://eur-lex.europa.eu/LexUriServ/LexUriServ. do?uri=COM: 2013:0561:FIN:EN:PDF (retrieved 5.11.15). [4] COM(2012) 401 final. Towards better access to scientific information: boosting the benefits of public investments in research. Available from: https://ec.europa.eu/research/science-society/document library/pdf 06/eracommunication-towards-better-access-to-scientific-information en.pdf (retrieved 5.11.15). [5] Commission Recommendation of 17.7.2012 on access to and of scientific information (2012/417/EU). Availpreservation able from: http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/ (retrieved ?uri=CELEX: 32012H0417&qid=1445980987649&from=EN 5.11.15). [6] Access vs. importance: a global study assessing the importance of and ease of access to professional and academic information phase 1 results. Publishing Research Consortium. Available from: http://publishingresearchconsortium. com/index.php/prc-documents/prc-research-projects/19-prc-access-vsimportance/file (retrieved 5.11.15). [7] Elsevier submission to Office of Science and Technology Policy public consultation on Public Access to Digital Data Resulting from Federally Funded Scientific Research, 2012, Available from: http://www.elsevier.com/ data/assets/pdf file/0007/78496/OSTP Data Response.pdf (retrieved 5.11.15). [8] S. Jones, Curation policies and support services of the main UK research funders, Version 2.2, Digital Curation Centre, 2012, Available from: http://www.dcc.ac. uk/webfm send/778 (retrieved 5.11.15). [9] UK Institutional Data Policies, Digital Curation Centre. Available from: http://www.dcc.ac.uk/resources/policy-and-legal/institutional-data-policies (retrieved 5.11.15). [10] Commission decision of 12 December 2011 on the reuse of Commission documents (2011/833/EU). Available from: http://eur-lex.europa.eu/legal-content/ EN/TXT/PDF/?uri=OJ:L: 2011:330:FULL&from=EN (retrieved 5.11.15). [11] Guidelines on Data Management in Horizon 2020, Version 1.0, December 2013, Available from: http://ec.europa.eu/research/participants/data/ref/h2020/ grants manual/hi/oa pilot/h2020-hi-oa-data-mgt en.pdf (retrieved 5.11.15). [12] OECD Principles and Guidelines for Access to Research Data from Public Funding. Available from: http://www.oecd.org/sti/sci-tech/38500813.pdf (retrieved 5.11.15). [13] G8 Science Ministers Statement. Available from: https://www.gov.uk/ government/news/g8-science-ministers-statement (retrieved 5.11.15). [14] P. Cruse, DataCite is an Exciting Place to Be. Available from: http://blog.datacite. org/datacite-is-an-exciting-place-to-be (retrieved 27.11.15). [15] Building the Materials Data Infrastructure: A Materials Community Planning Workshop, Arlington VA, Final Report, 2015, Available http://www.asminternational.org/documents/10192/19715738/ from:

Please cite this article in press as: T. Austin, Towards a digital infrastructure for engineering materials data, Mater. Discov. (2016), http://dx.doi.org/10.1016/j.md.2015.12.003

848 849 850 851 852 853 854 855 856

Q3

857

858

859

860 861 862 863 864 865 866 867

868

869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919

G Model MD 8 1–12 12 920 921 922

[16]

923 924 925

[17]

926 927

[18]

928 929

[19]

930 931 932

[20]

933 934 935

[21]

936 937

[22]

938 939

[23]

940 941

[24]

942 943 944

[25]

945 946 947 948

[26]

949 950 951 952

[27]

953 954 955 956

[28]

ARTICLE IN PRESS T. Austin / Materials Discovery xxx (2016) xxx–xxx

Materials+Community+Planning+Workshop+Report 030515/3960c291d2d3-446c-95d7-89740a57a233 (retrieved 5.11.15). Materials Genome Initiative Strategic Plan, 2014, Available from: http://www. whitehouse.gov/sites/default/files/microsites/ostp/NSTC/mgi strategic plan dec 2014.pdf (retrieved 4.11.15). J. Warren, R. Boisvert, Building the Materials Innovation Infrastructure: Data and Standards, NISTIR 7898, 2012, http://dx.doi.org/10.6028/NIST.IR.7898. C. Ward, J. Warren, Materials Genome Initiative: Materials Data, NISTIR 8038, 2015, http://dx.doi.org/10.6028/NIST.IR.8038. C. Ward, J. Warren, R. Hanisch, Making materials science and engineering data more valuable research products, Integr. Mater. Manuf. Innov. 3 (22) (2014), http://dx.doi.org/10.1186/s40192-014-0022-8. S. Arnold, F. Holland Jr., B. Bednarcyk, E. Pineda, Combining material and model pedigree is foundational to making ICME a reality, Integr. Mater. Manuf. Innov. 4 (4) (2015), http://dx.doi.org/10.1186/s40192-015-0031-2. S. Nishijima, S. Iwata (Eds.), Computerization and Networking of Materials Databases: Fifth Volume, ASTM STP 1311, 1995. C. Newton (Ed.), ASTM Manual on the Building of Materials Databases, ASTM Manual Series: MNL 19, 1993. A. Moreno, Interoperability for Digital Engineering Systems, Franco Angeli, 2015, ISBN 978-8891706003. B. Johanson (Ed.), Material Markup Language, Public Review Draft 01, 06 June 2006, 2006, Available from: http://docs.oasis-open.org/materials/materialsmatml-spec-pr-01.pdf (retrieved 18.11.15). M. Loveday, R. Skelton, High Temperature Mechanical Testing: A Review and Future Directions, in: ECF12-12th European Conference on Fracture, Sheffield, 1998, Available from: http://www.htmtc.com/Loveday&Skelton ECF12.PDF (retrieved 18.11.15). CWA 16200:2010, A Guide to the Development and Use of Standards compliant Data Formats for Engineering Materials Test Data, 2010, Available from: ftp://ftp.cen.eu/CEN/Sectors/List/ICT/CWAs/CWA16200 2010 ELSSI.pdf (retrieved 4.11.15). CWA 16762:2014, ICT Standards in Support of an eReporting Framework for the Engineering Materials Sector, 2014, Available from: http://www.cen.eu/work/ areas/ICT/eBusiness/Pages/WS-SERES.aspx (retrieved 18.11.15). CWA 16200:2010 Technical Specifications, 2010, Available from: http://uri.cen. eu/cwas/16200/2010/ed-01/technology/schema/xsd/iso6892-1.xsd (retrieved 8.12.15).

[29] DataCite Netherlands. Available from: http://datacite.tudelft.nl/info/home (retrieved 18.11.15). [30] DataCite Metadata Store API Documentation. Available from: https://mds. datacite.org/static/apidoc (retrieved 18.11.15). [31] DataCite Metadata Schema v 3.1. Available from: https://schema.datacite.org/ meta/kernel-3/index.html (retrieved 18.11.15). [32] L. Lin, T. Austin, W. Ren, Interoperability of materials database systems in support of nuclear energy development and potential applications for fuel cell material selection, Mater. Perform. Charact. 4 (2015), http://dx.doi.org/10. 1520/MPC20150003. [33] J. Houghton, N. Gruen, Open Research Data, Australian National Data Service, 2014, Available from: http://ands.org.au/resource/open-research-data.html (retrieved 8.12.15). [34] M. Bruchhausen, K. Turba, F. de Haan, P. Hähner, T. Austin, Y. Carlan de, Characterization of a 14Cr ODS steel by means of small punch and uniaxial testing with regard to creep and fatigue at elevated temperatures, J. Nucl. Mater. 444 (2014) 283–291, http://dx.doi.org/10.1016/j.jnucmat.2013.09.059. [35] Interagency Data Stewardship/Citations/provider guidelines – Federation of Earth Science Information Partners. Available from: http://wiki. esipfed.org/index.php/Interagency Data Stewardship/Citations/provider guidelines#Note on Versioning and Locators (retrieved 10.12.15). [36] J. Burke, Connections, Simon & Schuster, 2007, ISBN 978-0743299558. [37] MATTER, MATerials TEsting and Rules. Available from: http://cordis.europa.eu/ project/rcn/97428 en.html (retrieved 14.12.15). [38] INCEFA-PLUS, INcreasing Safety in NPPs by Covering Gaps in Environmental Fatigue Assessment. Available from: http://cordis.europa.eu/project/rcn/ 197289 en.html (retrieved 14.12.15). [39] ThermoML – An XML-based IUPAC Standard for Thermodynamic Property Data. Available from: http://www.iupac.org/namespaces/ThermoML (retrieved 14.12.15). [40] Chemical Markup Language – CML. Available from: http://www.xml-cml.org (retrieved 14.12.15). [41] CEN Workshop on standards compliant formats for fatigue test data – INCEFA. Available from: http://www.cen.eu/news/workshops/Pages/WS2015-012.aspx (retrieved 14.12.15). [42] ISO Standards Maintenance Portal. Available from: http://standards.iso.org (retrieved 14.12.15).

Please cite this article in press as: T. Austin, Towards a digital infrastructure for engineering materials data, Mater. Discov. (2016), http://dx.doi.org/10.1016/j.md.2015.12.003

957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993