Toolkit for intensive work with metadata in specialized information systems

Toolkit for intensive work with metadata in specialized information systems

Available online at www.sciencedirect.com Available online at www.sciencedirect.com ScienceDirect ScienceDirect Procedia Computer Science 00 (2017) ...

NAN Sizes 0 Downloads 62 Views

Available online at www.sciencedirect.com Available online at www.sciencedirect.com

ScienceDirect ScienceDirect

Procedia Computer Science 00 (2017) 000–000 Available online at www.sciencedirect.com Procedia Computer Science 00 (2017) 000–000

ScienceDirect

www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia

Procedia Computer Science 119 (2017) 59–64

6th International Young Scientists Conference in HPC and Simulation, YSC 2017, 1-3 November 6th International Young Scientists Conference in HPC and Simulation, YSC 2017, 1-3 November 2017, Kotka, Finland 2017, Kotka, Finland

Toolkit for intensive work with metadata in specialized information Toolkit for intensive work with metadata in specialized information systems systems Andrey Polyakova,a, **, Dmitry Kokovinaa, Alexey Poydaaa, Mikhail Zhizhinbb, b b,c Poyda , Mikhail Zhizhin a,d Andrey Polyakov , Dmitry Kokovin , Alexey , Alexander Andreev b, Alexander Govorovb,c and Viacheslav Ilyina,d Alexander Andreev , Alexander Govorov and Viacheslav Ilyin a NRC Kurchatov Institute, Moscow, Russia a NRC Kurchatov Institute, Moscow, Russia Moscow, Russia Space Research Institute of the Russian Academy of Science, b c Moscow StateInstitute University of Geodesy and Cartography, Moscow, Russia Space Research of the Russian Academy of Science, Moscow, Russia c d Moscow Institute of Physics and Technology (MIPT), Moscow, Russia Moscow State University of Geodesy and Cartography, Moscow, Russia d Moscow Institute of Physics and Technology (MIPT), Moscow, Russia b

Abstract Abstract We present a concept of the information system for intensive work with metadata in heterogeneous data store to modify dynamically We a concept the information system intensive with metadata in heterogeneous data store to modify bothpresent the data and the of metadata models by user.for This conceptwork is illustrated by example of such information system fordynamically the project both and the metadata models by user. This concept is illustrated example of such information systemfrescoes for the project wherethethedata digital registrar is developing for the culture heritage of Oldby Russian paintings, namely Dionisius in the where the of digital registrar is developing the cultureMonastery heritage of Old Russian Cathedral the Nativity of the Virgin at theforFerapontov painted in 1502. paintings, namely Dionisius frescoes in the Cathedral of the Nativity of the Virgin at the Ferapontov Monastery painted in 1502. © 2017 The Authors. Published by Elsevier B.V. © 2018 Published Elsevier B.V. © 2017 The The Authors. Authors. Published by by B.V. committee of the 6th International Young Scientist conference in HPC and Peer-review under responsibility of Elsevier the scientific Peer-review under responsibility of the scientific committee of the 6th International Young Scientist conference in HPC and Peer-review under responsibility of the scientific committee of the 6th International Young Scientist conference in HPC and Simulation. Simulation Simulation. Keywords: metadata, data model, information system, Dionisius frescoes, digital registrar Keywords: metadata, data model, information system, Dionisius frescoes, digital registrar

* Corresponding author. * Corresponding E-mail address:author. [email protected] E-mail address: [email protected] 1877-0509 © 2017 The Authors. Published by Elsevier B.V. Peer-review underThe responsibility of theby scientific of the 6th International Young Scientist conference in HPC and 1877-0509 © 2017 Authors. Published Elsevier committee B.V.

Peer-review Simulation. under responsibility of the scientific committee of the 6th International Young Scientist conference in HPC and Simulation. 1877-0509 © 2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 6th International Young Scientist conference in HPC and Simulation 10.1016/j.procs.2017.11.160

Andrey Polyakov et al. / Procedia Computer Science 119 (2017) 59–64 Author name / Procedia Computer Science 00 (2017) 000–000

60 2

1. Introduction In various applications, key problem is the choice of information system to provide effective work with data. Now there are many IT solutions with wide spectrum of services to meet different types of application requests such as cost of the system and its technical support, flexibility, scalability, openness, DBMS choice etc. One of the most popular information system of this type is the CRM (Customer Relationship Management) ones [1] targeted on automation of working processes in companies and organizations. However, for small companies and organizations CRMs are too expensive. In these cases, work with data is reduced to data storage in file system on local discs, or in public cloud resources [2] such as GoogleDrive and OneDrive. It is worth to note that decisive factors, such as data types, are fixed in existing information systems. It brings to the problems in case when data heterogeneity is changed during the life of the application project. Now dynamic modification of data models and types of connections between them becomes to be more and more important, as well developing functionality of the system and developing methods of data elaboration and analysis. Furthermore, the time dimension becomes to be important too, when data, coming in time, should be jointed in tracks according handpicked data attribute. An illustrative example can be medical registers where data of medical investigations and analysis should be collected in the time tracks for each patient. Note that for such data attributes as heterogeneity, links to other data (e.g. time tracks), data models binding to the elaboration and analysis methods, history of the data birth and changing, special terminological concept metadata (data about data) has been developed much last 10-15 years. One can mention ontologies [3] as one of the most productive developments in this direction. However, work with metadata has static character in existing IT solutions, without possibility of dynamical modification or formation of new data models. In following we will use the term data for input data and for derived data of the same type, and will use the term metadata for derived data describing data structure, details of the data provenance (history), links with other data or with metadata etc. Thus, it is an open problem now to develop the toolkit to equip users with ability to organize the dynamical modification of data and metadata models in transparent and easy way. Formulations above have a conception character but we will not develop here the corresponding theoretical constructions in deep. Instead we present key points of the realization of this conception for specific class of applications, where ü ü

data are stored in file system while metadata can be placed in relational database to ensure high performance for search and filtering; users work with metadata during the preparation of data processing jobs while the actual data are used when the job is performed.

As a pilot application, we consider in this paper the information system under developing for Dionisius frescoes painted in 1502 on the walls in the Cathedral of the Nativity of the Virgin at the Ferapontov Monastery [4]. The Dionisius frescoes have been included in the UNESCO World Heritage list in 2000. In 80’s years of last century there a unique project was undertaken on conservation and preservation of the Dionisius frescoes [5,6]. Then, in 00’s years, special heating and climate control system was established in the Cathedral of the Nativity of the Virgin. As a result, the physical conditions of the frescoes are good and stable for now. It gives additional argument for creation specialized information system described above for this culture heritage. Important point is that such a system could (or even, should) be extended with data provided by future investigations of the physical condition of the frescoes. The information (data+metadata) stored and created in the system under elaboration will allow to snapshot current condition of the frescoes at high resolution and accuracy requested by specialists in the field of heritage conservation [7]. For these purposes the main source of the data is macro and micro photography in different spectral bands – UV (ultraviolet), visible and IR (infrared) lights. In following, we will use a term digital registrar for the above concept in application to the Ancient Culture Heritage, and a name SuBMiT (Service Based Metadata management sysTem) for the software implementation of this concept. The paper is organized as follows. In Section 2 we give basic information about the object of culture heritage, wall paintings (frescoes) by Dionisius in the Cathedral of the Nativity of the Virgin at the Ferapontov Monastery, for which the digital registrar is under creating. In Section 3 an architecture is discussed for the SuBMiT toolkit designed taking into account specifics of the application chosen. In Section 4 the pilot programming implementation is discussing in



Andrey Polyakov et al. / Procedia Computer Science 119 (2017) 59–64 Author name / Procedia Computer Science 00 (2017) 000–000

61 3

major details. In Conclusion, we formulate key accents of the SuBMiT toolkit proposal in relation to the digital registrar conception. 2. Digital registrar of Dionisius frescous According our conception the digital registrar for the complex of compositions of Dionisius frescoes will include digital photos in different spectral bands and in various scales, from close-up shots and orthorectified photo images [8] of each composition, up to macro scale 1:1 for photography images. In total, there are about 300 compositions [9] on Dionisous frescoes, each of few square meters. Typical size of RAW photo image is about 20 Mbyte. Size of the sensor matrix for camera Canon 650D is 14.9х22.3mm. So, the image made in scale 1:1 covers area on the painting of approximately same size. The resolution provided by this scale is few microns what corresponds to the request of experts in the heritage conservation field. It means that 1 m2 of paintings asks to get three thousand photo images if they are made with 40% of overlapping, what is necessary to allow assembling these photos in one image for a composition with requested resolution. In result, total volume of the 1:1 photos is about 60 Gbyte for 1 m2. If one makes photos in various spectra, say in 10 bands from UV to IR, then volume of the image data will be about 0.5 Tbyte. For whole wall paintings in the Cathedral of the Nativity of the Virgin (~650 m2) one gets volume about 300 Tbyte. For the digital registrar it is necessary to have images in other scales too, e.g. in scales 6x9 cm and 20x30 cm to provide navigation over the compositions without loss of accuracy with the help of pyramid technology [10]. One can conclude, therefore, that SubMiT has to be scaled up to Pbytes. It is worth to note that input RAW images will not be changed or modified when stored in the digital registrar. In data processing pipeline the users can create new images (derived images) from raw photographs stored. An example of such derived image is so called orthorectified image for a single composition in Dionisius frescoes [8]. Note that the ortho rectification will provide a (geometrical) coordinate system for positioning images of different scales within the composition and can be used for navigation over the frescoes. The derived images also may be saved in the file system and they cannot be changed or modified after the saving. As we pointed out above, the end users, e.g. specialists in heritage conservation, later will edit and change metadata rather than data (images). The images will be used only by services when jobs or workflows are running. Links to these files (raw and derived images) are elements of their metadata. 3. SuBMiT architecture The toolkit should allow easy installation by users as well easy tuning on concrete application. Services should provide the following functions (this list has initial character and will be completed later): • • • • • • •

design of new data and metadata models with function of their dynamical modification; registration of input data with extraction of initial metadata; multi-criteria search on metadata attributes; easy integration of external applications and services by users; ingesting, restructuring and data processing jobs and workflows; data share and access permissions to the toolkit resources; collaborative work for multiple users.

An architecture proposed for the SuBMiT toolkit is shown in Fig. 1. Short (initial) list of details of this architecture includes the following. Key component is metadata register, main elements of this register are data and metadata models [11]. Interface to these models should have standardized format and will be provided by services. Services will be, at least, of three groups – system, application and user’s. One of the basic functions for system services will be extracting initial set of metadata from input data as well creation of further metadata during the user work. System services should provide also the job loading on computing resources, processing inquires to the storage (file) system etc. One of the main function of user’s services should be creation and modification of data and metadata models.

462

Author name / Procedia Science 00 (2017) 000–000 Andrey Polyakov et al.Computer / Procedia Computer Science 119 (2017) 59–64

Fig. 1. Architecture of the SuBMiT toolkit.

4. Pilot implementation of the SuBMiT-Dionisius toolkit The pilot implementation of the SuBMiT toolkit for Dionisius frescoes is under developing. Working version of the pilot is ready for testing. It is installed on a local cloud system, and access to this pilot version is available by reference http://submit.webhop.net. It includes, in particular, web-interface with data and metadata models editor, relational database for metadata storage, options for fast search and access to inquired data and metadata, services for job and workflow formation and loading, interface to monitor loading computing resources and controls for job completing, services for work with users. Metadata are formatted by use of XML structures [12], which are mapped to relational database MySQL 5.1 [13]. Example of this scheme is presented on Fig. 2.

Fig. 2. Example of the data model scheme in SuBMiT toolkit.



Andrey Polyakov et al. / Procedia Computer Science 119 (2017) 59–64 Author name / Procedia Computer Science 00 (2017) 000–000

63 5

This scheme includes different types of the fields: text, number, reference etc. There has been created special service to make syntactic analysis of the model (written on XML) and to transmit his model to the corresponding SQL requirements by use of the rules defined for object-relational mapping. Services for exchange between data storage and other toolkit components are made by use of Java Enterprise Edition (J2EE) [14] technology. Apache Struts 270 framework [15] was used for controller provided business logic and interconnection with data and metadata models. Interface with users is realized by use of EJB 2.1, Servlets 2.5, JSP 2.1 specifications [16-18]. 5. Conclusions As a conclusion, we give accent on two key points of the above discussion. At first, the information (data+metadata) stored and created in the SuBMiT toolkit under elaboration will allow to snapshot current condition of the frescoes at high resolution and accuracy requested by specialists in the field of the art heritage conservation (e.g. Dionisius frescoes). These specialists can add new and new data (of new types, from time to time) produced in their future investigations of the physical conditions of the Dionisius frescoes. Thus, they will get unique possibility to analyze current physical conditions of the frescoes taking into account very detailed information obtained in previous years. Based on this understanding we formulate a concept of digital registrar of art heritage on the example of Dionisius frescoes. At second, analysis of the work of specialists in art heritage conservation has led us to understanding that two key specifics of the information system (SuBMiT toolkit) should be taken into account in the digital registrar program realization: user works with metadata rather than with data, and dynamical modification of existing and creating new data and metadata models should be one of the main functions of SuBMiT toolkit available for users in easy way. Acknowledgements The work on SuBMiT-Dionisius toolkit is supported by RFBR grant # 16-07-01177. This work has been carried out using computing resources of the federal collective usage center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/. References [1] Viliam L. (2009) “CRM architecture for effective enterprise relationship marketing”, Journa of Information, Control and Management Systems 1: 7-11. [2] Wycislik M. and Ellis C. (2017) “The best free cloud storage service 2017”, retrieved from http://www.techradar.com/news/the-best-free-cloudstorage. [3] Janis A., Bubenko jr. (2007) “From Information Algebra to Enterprise Modelling and Ontologies - a Historical Perspective on Modelling for Information Systems”, in John Krogstie et al. (eds) Conceptual Modelling in Information Systems Engineering, pp. 1-18. [4] Museum of Dionysius Frescoes, http://www.dionisy.com/monastery/. In Russian. [5] Lelekova O. and Naumova M. (1989) “Wall paintings by Dionisius in the Cathedral of the Nativity of the Virgin at the Ferapontov Monastery (results of the restoration investigations)”, in G. Popov (ed.) Old Russian art. Art heritages of North Russia, Moscow, Nauka, pp.63-68. ISBN 502-012713-2. In Russian. [6] Lelekova O. (2005) “On the experience of the conservation work in the Cathedral of the Nativity of the Virgin at the Ferapontov Monastery”. in Trezvov A. and Lishiz L. (eds.) Proceedings of International Conference “Research in conservation of the culture heritage” (Moscow, 12-14 October 2004), Moscow, Publishing house “Indrik", pp.159-164. ISBN 5-85759-334-4. In Russian. [7] Bregman N., Chistyakov V. (2005) “On the digital documentation of the color changes in painting”, in Trezvov A. and Lishiz L. (eds.) Proceedings of International Conference “Research in conservation of the culture heritage” (Moscow, 12-14 October 2004), Moscow, Publishing house “Indrik", pp.37-40. ISBN 5-85759-334-4. In Russian. [8] Gopala Krishna B. (2006) “Orthoimage generation”, in “Tutorial: Extraction of Geospatial information from high special resolution optical satellite sensors”, Section 5. Retrival from http://www.isprs.org/education/PDF/GOA_BGK_Orthoimage_Section5.pdf. [9] Shelkova E. (2005) “Guide on the compositions on the Dionisius wall paintings made in 1502 in the Cathedral of the Nativity of the Virgin at the Ferapontov Monastery”, Publishing house "Northern pilgrim" pp.1-151. In Russian. [10] Jaime R. (2008) “A deepzoom primer (explained and coded”), retrieved from https://blogs.msdn.microsoft.com/jaimer/2008/04/01/adeepzoom-primer-explained-and-coded.

64 6

Andrey Polyakov et al. / Procedia Computer Science 119 (2017) 59–64 Author name / Procedia Computer Science 00 (2017) 000–000

[11] niso.org (2004) “Understanding metadata. National Information Standards Organization (NISO)”, retrieved from http://www.niso.org/standards/resources/UnderstandingMetadata.pdf. [12] w3.org (2017) “Extensible Markup Language (XML) 1.1 (Second Edition)”, World Wide Web Consortium, retrieved from https://www.w3.org/TR/xml11/. [13] docs.oracle.com (2017), “MySQL 5.1 Release Notes”, retrieved from https://docs.oracle.com/cd/E17952_01/mysql-5.1-relnotesen/index.html. [14] oracle.com (2017), “Java EE Overview. Oracle Corporation”, retrieved from http://www.oracle.com/technetwork/java/javaee/overview/index.html. [15] struts.apache.org (2017), “An open source framework for creating Java EE web applications”, retrieved from https://struts.apache.org/. [16] oracle.com (2017), “EJB 2.1 Specification”, retrieved from http://www.oracle.com/technetwork/java/javaee/ejb/index.html. [17] oracle.com (2017), “Servlets 2.5 Specification”, retrieved from http://www.oracle.com/technetwork/java/index-jsp-135475.html. [18] oracle.com (2017) “JSP Specification 2.1”, retrieved from http://www.oracle.com/technetwork/java/javaee/jsp/index.html.