Challenges in automatic Munsell color profiling for cultural heritage

Challenges in automatic Munsell color profiling for cultural heritage

Challenges in automatic Munsell color profiling for cultural heritage Journal Pre-proof Challenges in automatic Munsell color profiling for cultural...

585KB Sizes 0 Downloads 62 Views

Challenges in automatic Munsell color profiling for cultural heritage

Journal Pre-proof

Challenges in automatic Munsell color profiling for cultural heritage Filippo Luigi Maria Milotta, Giuseppe Furnari, Camillo Quattrocchi, Stefania Pasquale, Dario Allegra, Anna Maria Gueli, Filippo Stanco, Davide Tanasi PII: DOI: Reference:

S0167-8655(19)30371-X https://doi.org/10.1016/j.patrec.2019.12.008 PATREC 7734

To appear in:

Pattern Recognition Letters

Received date: Revised date: Accepted date:

16 July 2019 2 December 2019 8 December 2019

Please cite this article as: Filippo Luigi Maria Milotta, Giuseppe Furnari, Camillo Quattrocchi, Stefania Pasquale, Dario Allegra, Anna Maria Gueli, Filippo Stanco, Davide Tanasi, Challenges in automatic Munsell color profiling for cultural heritage, Pattern Recognition Letters (2019), doi: https://doi.org/10.1016/j.patrec.2019.12.008

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.

1 Highlights • Extension of the ARCA dataset (+453% images,+561% samples). • Generalization-tests of color specification performed with a classification approach. • Synthetic images rendering proposed procedure enables future deep learning approaches.

1

Pattern Recognition Letters journal homepage: www.elsevier.com

Challenges in automatic Munsell color profiling for cultural heritage Filippo Luigi Maria Milottaa,∗∗, Giuseppe Furnaria , Camillo Quattrocchia , Stefania Pasqualeb , Dario Allegraa , Anna Maria Guelib , Filippo Stancoa , Davide Tanasic a University

of Catania, Department of Mathematics and Computer Science, Via Santa Sofia - 64, Catania 95125, Italy of Catania, Department of Physics and Astronomy “Ettore Majorana” & INFN–CT, Via Santa Sofia - 64, Catania 95125, Italy c University of South Florida, Department of History, 4202 E. Fowler Ave, Tampa 33620, Florida b University

ABSTRACT Color specification is the process of measuring the color of a sample in a given color space. We focused onto the Munsell color space as archaeologists are used to employ the so called Munsell Soil Color Charts (MSCCs) directly in the excavation sites. For these scholars and researchers, being enabled to perform Munsell color specification in an automatic way is crucial, as they spend a lot of time to subjectively specify colors in the Munsell system. We extended the dataset ARCA328, which was specifically thought for the automatic Munsell color specification issue, increasing the number of images from 328 to 1,488, and the number of samples from 56,160 to 315,333. Then, we conducted generalization-tests of color conversion for color specification, adopting a classification approach instead of a regression one. This choice was motivated by the fact that the set of all the possible HVC coordinates in the MSCCs is a discrete one. Hence, we decided to consider each chip in the MSCCs as a class to be learnt and recognized by the SVC. With these tests we highligthed the limits of automatic Munsell color specification without any reference-system or calibration phase. Finally, we gave insights for future works aimed to design automatic illuminant calibration phase and to investigate deep learning approaches, leveraging a synthetic images rendering procedure we also present in this work. c 2019 Elsevier Ltd. All rights reserved.

1. Introduction Color specification is the process of measuring the color of a sample in a given color space. The observer and the illuminating physic conditions must be properly defined through the process, in order to obtain meaningful and representative color coordinates (i.e., the measurement geometry). Color specification can be useful in many contexts, including quality assessment in industrial processes (Klein and Meyrath, 2010), investigation of food ingredients (Koh, 2007), resin composites in dentistry (Gueli et al., 2017), shading of paints (Rodrigues, 2004), and in restoration and conservation of Cultural Heritage (Stanco et al., 2011a; Zacharias, 2018). Specifically, the importance of color specification in Cultural Heritage studies and archaeology becomes crucial for the determination of the significance of the artefacts (Jones and MacGregor, 2002). ∗∗ Corresponding

author: Tel.: +39-095-738-3051; fax: +39-095-330-094; e-mail: [email protected] (Filippo Luigi Maria Milotta)

However, the only method available so far, the Munsell color specification (Kuehni, 2002), largely used to classify chromatic information on textiles and paintings, soils, stone, glass, metal and solid organic matter, such as bones and ivory, has been often questioned for being inaccurate and subjective (Gerharz et al., 1988; Pegalajar et al., 2019). Munsell color space was defined by Albert H. Munsell (Munsell, 1915), who designed a 3D color system based onto hue, value and chroma (referred as HVC), that is similar to the color space hue, saturation, and value (HSV). This color space can be considered as an irregular solid similar to a cylinder, despite the initial efforts to constraint it into a regular cylinder. The heigth of the cylinder defines the lightness, also known as value, from 0 to 10; the 360◦ around the central axis define the hue; the radial distance from the central axis defines the chroma. Since chroma can potentially range until infinite (resulting in an irregular solid), MacAdam demonstrated that chroma is actually parametrized through several limits, which change accordingly with the hue (MacAdam, 1935). Another particular characteristic of the Munsell color

2 space is that only a discrete subset of all the possible HVC coordinates are actually commonly employed. This is due to the fact that the Munsell color space is usually adopted by archaeologists directly on excavation sites, through the use of the so called Munsell soil-color charts (MSCCs) (Gerharz et al., 1988; Ferguson, 2014; Jones and MacGregor, 2002; Frankel, 1980; Ruck and Brown, 2015). The MSCCs were released in different editions. The most common ones are the 2000 and 2009 (counting 9 and 13 charts, respectively). The MSCCs contain a variable number of chips-per-chart (from 16 to 42), representing the discretized HVC coordinates of the Munsell color space. However, the availability of several further thematic Munsell color sets, such as Munsell Color Charts for Plant Tissues, The New Munsell Student Color Set, the Munsell Bead Color Book, and the Geological Rock-Color Chart, and the lack in all the sets of specific and discrete name tags for each color chip further complicates the work of the archaeologists and cultural heritage professionals (Ferguson, 2014). The Munsell color specification employing MSCCs is a manual process performed by archaeologists, who visually compare a sample with the most similar-perceived chips in the MSCCs. The reflectance spectra of the chips has been proved, through neurobiological researches, to match the sensitivity of the Human Visual System (Conway and Livingstone, 2005). Eventually, archaeologists subjectively decide which HVC coordinates should be considered to specificy the color of a sample. However, this manual process is error-prone, time consuming and strongly subjective. MSCCs come with a set of masks and a list of good practices to minimize the uncertainty in the color specification. Subjective ambiguities during the manual color specification is related to the so called metamerism phenomenon (Oleari, 2016). Metamerism occurs when two colors appear to match under one lighting condition, but not when the light changes. Metameric matches are quite common, especially in near neutral colors like grays, whites, and dark colors. As colors become lighter or more saturated, the range of possible metameric matches becomes smaller. In other words, HVC coordinates with low V value tend to be more ambiguous (Milotta et al., 2018c). The reason for this phenomenon lies in the light source and the way the object reflects that light to give us the perception of color. In colorimetry, metamerism is a perceived matching of colors with different (nonmatching) spectral power distributions. Colors that match this way are called metamers. Metamerism is a very important phenomenon from a scientific and practical point of view, particularly in cases such as on site measurements campaigns. Indeed, it is important to consider that there are differences in color vision between observers, due to the profile of light sensitivity in each type of cone that differs from one person to the next. As a result, two spectrally dissimilar lights or surfaces may produce a color match for one observer, but fail to match when viewed by a second observer (observer metamerism). Furthermore, because the relative proportions of the three cone types in the eye retina vary from the center of the visual field to the periphery, some colors may match when viewed in low quantities, and may not when presented as large color areas (field-size metamerism). Finally, there are cases in which samples can appear to match at one an-

gle of illumination but do not match if the angle of illumination or angle of view is changed (geometric metamerism), and other ones in which two specimens match when viewed under one light source but not another (illuminant metamerism). Because of this phenomenon, light standardization is fundamental. The color is perceived in a subjective way (Goodwin, 2000). This is the reason why the CIE (Commission International de l’Eclairage (CIE, 2004)) standardized colorimetric and photometric practices. In other words, CIE defined many variables that may influence color perception with the purpose of investigating effective ways to reduce the impact of the human-factor. For instance, the illuminants (e.g., C, D65, 840, F, and UV) and cameras degrees of freedom (i.e., angle of view and distance) must be measured and known (ASTM, 2014), while it is recommended to have an opaque background which reduces, or even avoids, light reflection. Following the CIE guidelines can be very expensive, as specific facilities are required to objectively set color specification procedures. These include a Color Assessment Cabinet (CAC) to constraint lighting conditions, background, and camera settings; a spectrophotometer to sample color information into a perceptively uniform color space (i.e., L*a*b*); an illuminometer to measure the light quantity on the sample surface; and a spectroradiometer to measure spectral radiant distribution of the source. Usually, archaeologists do not have access to all of these facilities in an excavation site, resulting in additional costs and time required to properly move away and analyze all the samples in a laboratory. Hence, they employ the MSCCs, instead. MSCCs are a so widely adopted tool, that several international standards define how they should be correctly employed: i.e., the American Z138.2 and the ASTM D1535-14 (ASTM, 2014), the German DIN6164, and the Japanese JIS Z872. Other standards arisen to define tolerances, like the ASTM D3131-97 (ASTM, 1997) and the ASTM D2244-05 (ASTM, 2005). Applications of MSCCs include color specification of soils (USA, 1975; S´anchez-Mara˜no´ n et al., 2004), of colored potteries and glasses, of organic materials, and also of paintings. Indeed, through the analysis of pictorial technique of a sample it is possible to relate artifacts to a specific date or culture (Gerharz et al., 1988). Then, one may say that subjectivity of the color specification is due to the Human Vision System (HVS). Taking pictures of the sample could avoid to rely onto the HVS, in favour of an objective digital-eye. This results in changing the paradigm from manual to automatic Munsell color specification. Following the guidelines of ASTM D1535-14 standard for image acquisition, digital cameras have been employed in laboratory with a controlled environment (i.e., fixing the variables that may influence color perception) (Aydemir et al., 2004; Rossel et al., 2008; Stanco et al., 2011b; O’Donnell et al., 2011; Stanco et al., 2012; Stanco and Gueli, 2013; Chenoweth and Farahani, 2015). In these works, automatic and semi-automatic white balancing through Macbeth color checker were enablers for color specification, and also for the investigation of related topics like defects restoration, and aging estimation for pottery. However, in the case of a digital acquisition through an image, even in a supervised environment, one cannot guarantee to obtain the

3 same color values in the images when changing the device, as the color sensitivity of the device directly influence the color values. This behaviour is formally defined in (Finlayson et al., 2005), through the following image formation equation: Z qk = E(λ)S (λ)Qk (λ)dλ (1) ω

where qk is device sensor response, ω is the range of possible color values while λ is a single color value, E is the energy of the lightning source (i.e., the illuminant), S is the surface reflectance quality, and Qk is the color sensitivity of the sensor. Due to this direct influence of the device sensitivity, to perform color specification with digital images is not trivial, unless a reference-system is employed (i.e., a Macbeth color checker, or a Konica Minolta CS-A5 white calibration plate) or a calibration process is previously performed. Other researches focused onto general purpose devices like smartphones, addressing the benefits of the mobile solutions in terms of userexperience (G´omez-Robledo et al., 2013; Han et al., 2016). In 2017, we specifically started a project for automatic Munsell color specification, that is named ARCA: Automatic Recognition of Color for Archaeology. In an incremental way, we firstly designed a desktop application and investigated capabilities of automatic Munsell color specification in outdoor and uncontrolled environment (Milotta et al., 2017; Centore, 2012). We gathered feedbacks from archaeologists, and we decided to improve the ARCA desktop user-experience releasing a web-based application (Milotta et al., 2018b). We also extended our experiments to indoor and supervised environment, increasing the ARCA dataset (Milotta et al., 2018a), and reporting formal measurements as recommended by the CIE colorimetric and photometric practices (Milotta et al., 2018c). In our previous works, we leveraged color conversions based on transformation and empyric equations (e.g., formulas retrieved from several colorimetric standards or computed through regression from the gathered dataset, respectively). We also investigated a combination of the two methods. Since this kind of color space conversion returns value in a continuous range, we employed a quantization step. Then, we reduced the possible specifiable Munsell HVC coordinates into the discrete subset which is meaningful for the archaeologists. In this work, we employed a solution for the color conversion issue that is based onto a classification method instead of a regression one. We also further extended the ARCA dataset with new additional images, and performed several generalization tests to investigates the limits of the automatic Munsell color specification task. Accordingly to Equation 1, we measured the impact of illuminant and device sensitivity changes. We highligthed the limits of color specification without any reference-system or calibration phase. The outcomes open the possibility for further researches in the automatic color specification designing automatic calibration phase or investigating deep learning approaches. In sum, the contributions of this work, and improvements with respect to our previous researches, are as follows: • Extension of the ARCA dataset (+453% images, +561% samples): we extended and make publicly available the ARCA dataset, increasing the number of images from

328 to 1488, and the number of samples from 56,160 to 315,333. Moreover, we also added in the dataset the ground truth labels for all the images, and the sampled values employed in this work, resulting in a dataset more valuable than before. Images in the dataset come in several versions, particularly the “Original” ones depict the MSCCs together with a reference white, that could be useful for future works; • Outcomes of generalization-tests of color conversion for color specification performed with a classification approach instead of a regression one. With these tests we highlighted the limits of automatic Munsell color specification without any reference-system or calibration phase; • Insights for future works aimed to design automatic calibration phase or investigate deep learning approaches. The remainder of the paper is structured as follows: ARCA1488 Dataset is described in Section 2. Methods and experiments are reported in Section 3. Finally, we discuss possible future works and conclude the paper in Section 4. 2. ARCA1488 Dataset In this section we describe how ARCA1488 dataset is made, particularly focusing onto ARCA328 extension (Section 2.1), MSCCs chips sampling (Section 2.2), discussing illuminants and describing the rendering of synthetic images (Section 2.3). 2.1. Extension of ARCA328 Dataset ARCA328 was our previous version of the image dataset (Milotta et al., 2018a). It contains 328 images depicting MSCCs acquired in both outdoor and indoor environments, together with images of ancient pottery shards, that was employed as real test-cases. In the present work, we firstly extended the dataset acquiring 10 MSCCs with the following 3 devices: • Smartphone Huawei P10 Lite (referred as S1); • Smartphone Huawei Honor 9 (referred as S2); • Reflex Canon EOS 60D mounting a lens Amron 24-70mm f/2.8 Di VC USD (referred as R). The experimental setting is the same of our previous indooraimed work (Milotta et al., 2018a). We acquired photos of MSCCs Edition 2009 in a controlled environment, within a Color Assessment Cabinet (CAC) of VeriVide. We fixed the illuminant to D65. Light quantity on the surface of the charts were measured with an illuminometer. Images were acquired in fixed manual focus, and ISO 100. Differently by previous works, original images were manually cropped in order to have all the chips of the MSCCs within a meaningful region of interest. Images from device R were additionally processed to remove lens distortion. Then, we postprocessed the cropped image generating equalized (EQ), white balanced (WB), and a combination of equalization and white balancing (EQ+WB) versions of the images. This was done in

4

Fig. 1. ARCA328 image dataset extended to ARCA1488.

order to investigate the impact of illumminant invariance in the automatic Munsell color specification (Finlayson et al., 2005). Through these steps we got a total of 150 additional images. We also synthetically generated 1010 images starting from images acquired with device S2. Rendering process of the synthetic images is detailed in Section 2.3. Hence, we managed to increase the number of images in the ARCA dataset from 328 to 1488. Image dataset extension is recapped in Figure 1. We attached to ARCA1488 the ground truth labels of the MSCCs chips, together with all the samples employed in this work. The new dataset is publicly available at: https://iplab.dmi.unict.it/ARCA1488/. 2.2. MSCCs chips sampling The MSCCs chips sampling flow is depicted in Figure 2(a). Given the MSCCs Edition 2009, we acquired 10 charts (excluding GLAYs and WHITE). Once original images were manually cropped, we defined an automatic chip sampling procedure capable of gathering the 331 chips from MSCCs at a glance. Samples consist of RGB values of the chips, obtained computing the average of a 5×5 pixels window almost centered onto each chip (Figure 2(b)). In order to increase the number of samples in the dataset, we designed a randomized sampling. This means that instead of sampling just a single window from each chip, we randomly sampled several windows, in similar but not same locations. In other words, through randomized sampling, we randomly extracted a number of windows within a reasonable range from almost the center of each chip. In our experiment, we set the radius of the range equal to 10 pixels, and the number of windows to 20 (Figure 2(c)). Since we came to consider 20 samples per chip instead of just one, through randomized sampling we were able to increase the number of samples from 331 to 6,620. Finally, we added further variability to the randomly extracted samples. This additional variability is achieved slightly changing (still randomly) the color values of each channel. In our experiment, the variability was set to ±3, and it is independently computed for each channel. Then, for each chip, we considered both samples with and without variability (i.e., 13,240 samples per device). Considering the 4 versions of the images reported in Figure 1 (excluding original images which were not sampled), this results in 158,880 samples, whereas ARCA328 was counting only 56,160 samples. 2.3. Illuminants and rendering of synthetic images The colour of a non-self-luminous body depends on the radiation that illuminates it, and then colorimetric practice poses the

problem of defining the illuminating radiation that is important for colorimetry. The problem is not trivial, as the reproduction in laboratory of the light sources is complex. For instance, there is not a unique daylight to refer to, as this depends on the day hour, latitude, season, atmospheric haze, etc. Practical needs impose the definition of some radiations with spectral power distributions considered to be significant. This leads to distinguish between radiations emitted by light sources and illuminants. An illuminant is a radiation with a relative spectral power distribution defined over the wavelength range that influences object-colour perception (Oleari, 2016). Over the time, with the evolution of light source technology, the CIE modified the set of Standard Illuminants (CIE, 2004, 2006). Today the CIE standard illuminants are: 1. A that is a planckian radiation at a temperature of about 2,848K; 2. D65, that represents the relative spectral power distribution related to a phase of the Daylight (D) with a correlated colour temperature of approximately 6,500K. In the pedix the temperature is divided by 100. The CIE also proposes other Ds illuminant that are D50, D55, D60 and D75. The Daylight illuminants are considered the most important light in colorimetry. The CIE recommends D65 to be used whenever possible and in alternative to others. Other CIE illuminants are the F series, that represents various types of fluorescent lighting. This kind of lightning can be variable due to different combinations of used gases and phosphors. As illuminants are mathematical tables of values (relative power versus wavelength) used for colorimetric computations, it is possible to use, as in our case, a simulator. Simulator of CIE illuminant is a source with a spectral power distribution as close as possible to that chosen, in our case D65 standard illuminant in CAC. Then, we artificially simulated other illuminants (like D50, D55, and D75). Since all the images acquired in this work are taken with the illuminant fixed to D65, we further extended the dataset adding synthetic images with different illuminants. The purpose is to investigate the limits of the automatic Munsell color specification when illuminant changes. Leveraging Blender 3D software, we designed a pipeline for rendering synthetic images with different illuminants (Figure 3). This is done changing the color temperature. For instance, illuminant D65 (our reference) has a temperature of about 6,500K. With this in mind, we selected images acquired with device S2, and we rendered 101 frames per chart with the following key-frames:

5

(a)

(b)

(c)

Fig. 2. MSCCs chips sampling. (a) Sampling flow. (b) Chip Sampling. (c) Randomized Sampling (for clearness, we drawn just 4 random windows, while we actually extracted 20 of them).

Fig. 3. Rendering of synthetic images in Blender 3D Software. Illuminant is changed varying the Temperature slicer.

• 26 starting from 5,000K up to 5,500K; • 25 starting from 5,500K up to 6,500K; • 25 starting from 6,500K up to 7,500K; • 25 starting from 7,500K up to 15,000K. As one can see, we fixed more dense ranges for temperature under 7,500K, as these are the most common. In this work, through the rendering of synthetic images procedure, we were able to add 1010 images to the ARCA dataset. For our experiment, we applied randomized sampling with variability for the synthetic images, too. However, in this case, we set the number of extracted windows to 3, gaining 100,293 additional samples, for a total of 259,173. 3. Methods and Experiments In our previous works, we defined empirical equations needed to convert color values from RGB to Munsell color spaces, as done in (G´omez-Robledo et al., 2013; Rossel et al., 2006). We demonstrated this approach worked fine, but it also has severe limitations in terms of generalization (i.e., slight changes in any parameter of Equation 1 result in increasing error in Munsell color specification task). In this work, we conducted generalization-tests of color conversion for color specification performed with a classification approach, instead of a regression one. For our experiments, we leveraged the Support Vector Classifier (SVC) implementation of Python scikit-learn

library, whereas in our previous works we employed the Support Vector Regressor. Moving from a regressor to a classifier was a choice motivated by the fact that the set of all the possible HVC coordinates in the MSCCs, commonly employed by the archaeologists, is a discrete one. Hence, we decided to consider each chip in the MSCCs as a class to be learnt and recognized by the SVC. Having 3 devices, we got 3 possible configurations of Training (TR) and Test (TE) Sets split: • TR: S1+S2, TE: R • TR: S1+R, TE: S2 • TR: S2+R, TE: S1 Firstly, we just sampled each chip once (i.e., 331 samples per device). SVC accuracy is reported in Table 1, together with the Godlove distance, employed for measuring HVC coordinates distance in the Munsell color space (Godlove, 1951). Outcomes are really poor. Hence, we thought the amount of data was too small and we applied the randomized sampling step, gaining 20 samples per chip (i.e., 6,620 samples per device). SVC accuracy for this second experiment is reported in Table 2. Accuracy and Godlove distance improve, but they are still poor. We noticed that randomized sampled values were almost totally identic when extracted from the same chip. For this reason, we introduce the step of adding random variability to the samples. SVC accuracy for the randomized samples with additional variability are reported in Table 3. Accuracy and Godlove distance register another slight improvement, but

6

WBEQ

C 0.20 0.21 0.21 0.22 0.21 0.22 0.25 0.20 0.26 0.25 0.23 0.25

Godlove 7.42 7.57 7.27 7.47 7.50 7.40 7.17 7.31 7.21 7.15 7.31 7.25

WBEQ

EQ

WB Cropped

Table 2. Munsell evaluation (6620 randomized samples per device).

Training devices S1+S2 S1+R S2+R S1+S2 S1+R S2+R S1+S2 S1+R S2+R S1+S2 S1+R S2+R

Testing Accuracy Device H V R 0.13 0.24 S2 0.11 0.14 S1 0.14 0.22 R 0.15 0.16 S2 0.13 0.16 S1 0.15 0.20 R 0.18 0.26 S2 0.14 0.23 S1 0.20 0.28 R 0.19 0.25 S2 0.20 0.24 S1 0.21 0.26

C 0.22 0.23 0.26 0.21 0.25 0.28 0.31 0.26 0.33 0.36 0.31 0.36

Godlove 7.74 9.35 7.43 9.48 7.69 7.46 8.71 9.07 8.33 8.35 8.76 7.94

results are definitely still poor. Among the image versions (i.e., Cropped, WB: White Balanced, EQ: Equalized, WBEQ: and White Balanced with Equalization), the EQ and WBEQ ones have the best accuracy and Godlove distance. This is at least a confirmation that post-processing steps to obtain illuminantinvariance have a little positive impact on the color specification procedure (Finlayson et al., 2005). Then, we investigated the capabilities of a SVC trained onto the set of synthetic images, which were rendered from images acquired by device S2. We firstly adopted a training-test sets split obtained randomly picking 66% of the synthetic images in the training set and the remainder in the test set. SVC accuracy is reported in Table 4. From this outcomes, it is clear that we have overfitting over the training sets, perfect results over the test sets, and poor results over the validation sets (with an expectable slight better accuracy on the set from device S2, that was used for generating synthetic images). Thereafter, we changed the training-test sets split strategy leveraging the randomized sampling and taking 3 samples per chip (i.e., 2 sam-

WB Cropped

Testing Accuracy Device H V R 0.12 0.20 S2 0.12 0.15 S1 0.12 0.22 R 0.12 0.18 S2 0.12 0.18 S1 0.12 0.19 R 0.14 0.25 S2 0.15 0.22 S1 0.13 0.23 R 0.17 0.22 S2 0.15 0.24 S1 0.15 0.21

EQ

Training devices S1+S2 S1+R S2+R S1+S2 S1+R S2+R S1+S2 S1+R S2+R S1+S2 S1+R S2+R

Table 3. Munsell evaluation (6620 randomized samples per device with additional variability).

WBEQ

EQ

WB Cropped

Table 1. Munsell evaluation (331 samples per device). S1, S2 and R stand for Smartphone 1, Smartphone 2 and Reflex, respectively. Cropped, White Balanced (WB), Equalized (EQ), and White Balanced with Equalization (WBEQ) are the 4 possible versions of the images. H, V and C stand for Hue, Value and Chroma. Last column reports the Godlove distance. Accuracy is reported in percentage. Best values per column are underlined.

Training devices S1+S2 S1+R S2+R S1+S2 S1+R S2+R S1+S2 S1+R S2+R S1+S2 S1+R S2+R

Testing Accuracy Device H V R 0.14 0.30 S2 0.17 0.13 S1 0.17 0.35 R 0.17 0.24 S2 0.16 0.17 S1 0.16 0.31 R 0.24 0.41 S2 0.24 0.37 S1 0.25 0.43 R 0.24 0.40 S2 0.24 0.39 S1 0.24 0.42

C 0.31 0.27 0.28 0.30 0.28 0.30 0.38 0.34 0.39 0.45 0.39 0.41

Godlove 6.34 7.31 5.94 6.44 7.31 6.03 6.02 5.72 5.44 5.22 5.48 5.03

Table 4. Munsell evaluation (synthetic images). TR, TE and VA stand for Training, Test and Validation sets, respectively. S1, S2 and R stand for Smartphone 1, Smartphone 2 and Reflex, respectively. S1v, S2v and Rv denote the sets where variability were added during chip sampling. H, V and C stand for Hue, Value and Chroma. Last column reports the Godlove distance. Accuracy is reported in percentage.

Set TR TE VA: S1 VA: S1v VA: S2 VA: S2v VA: R VA: Rv

Testing Accuracy H V C 1.0 1.0 1.0 0.99 1.0 0.99 0.17 0.14 0.24 0.16 0.14 0.23 0.19 0.35 0.25 0.17 0.36 0.25 0.12 0.15 0.24 0.12 0.17 0.23

Godlove 0.0 0.03 8.61 8.71 7.87 7.75 8.71 8.48

Table 5. Munsell evaluation (synthetic images with additional variability).

Set TRv TEv VA: S1 VA: S1v VA: S2 VA: S2v VA: R VA: Rv

Testing Accuracy H V C 0.92 0.99 0.95 0.77 0.97 0.86 0.19 0.13 0.27 0.20 0.15 0.25 0.51 0.90 0.61 0.40 0.81 0.53 0.12 0.19 0.33 0.15 0.20 0.34

Godlove 0.14 0.53 7.22 7.19 1.47 2.50 6.79 6.69

ples in the training set and 1 sample in the test set). SVC accuracy is reported in Table 5. Good performance on training and test sets are confirmed, and while we register slight improvements, results are still poor in this other setting, too.

7 4. Conclusion and Future Works

Acknowledgements This research is supported by Piano della Ricerca 2016-2018 linea di Intervento 2 of DMI of the University of Catania.

In this work we discussed limits of the automatic Munsell color specification. Color specification is the process of measuring the color of a sample in a given color space. We focused onto the Munsell color space as archaeologists are used to employ the so called Munsell Soil Color Charts (MSCCs), a collection of discrete color coordinates from the Munsell system, directly in the excavation sites. For these scholars and researchers, being enabled to perform Munsell color specification in an automatic way is crucial, as they spend a lot of time to subjectively specify colors in the Munsell system. We recapped the related works and our previous research in this field, proposing several contributions in this new study. Most importantly, we extended and make publicly available the ARCA dataset, increasing the number of images from 328 to 1488, and the number of samples from 56,160 to 315,333. Moreover, we also added in the dataset the ground truth labels for all the images, and the sampled values employed in this work, resulting in a dataset more valuable than before. Images in the dataset come in several versions, particularly the “Original” ones depict the MSCCs together with a reference white, that could be useful for future works. We conducted generalization-tests of color conversion for color specification. We adopted a classification approach instead of a regression one. This choice was motivated by the fact that the set of all the possible HVC coordinates in the MSCCs, commonly employed by the archaeologists, is a discrete one. Hence, we decided to consider each chip in the MSCCs as a class to be learnt and recognized by the SVC. With these tests we highlighted the limits of automatic Munsell color specification without any reference-system or calibration phase. Due to the high variability introduced by Equation 1, it looks clear how a classifier can be trained until perfect overfitted, conversely resulting very low generalized. As future works, we firstly looked on the market, finding tools capable of calibrating themselves autonomously, and acquiring pictures under controlled environment with very reliable results (i.e., the XRite Munsell CAPSURE Color Matching Tool). However, our main goal is always to define a solution that could be cheap and affordable for any scholar or researcher. Professional tools like the ones recommended by the CIE are good, but at a cost. Once we consolidated through this work that neither transformation equations nor classifiers are able to generalize an automatic Munsell color specification model, we are planning to design a solution that could be based onto Computer Vision, in order to automatically recognize a reference marker (i.e., a White Calibration Plate CM-A145, or a Macbeth ColorChecker). Moreover, leveraging the synthetic images rendering procedure described in this work, we open the path for the fast creation of a new extended dataset made of images with very wide range of variable light conditions. A dataset of this kind could be definitely employed in a future investigation based on deep learning techniques.

References ASTM, 1997. Standard practice for establishing color and gloss tolerances. ASTM International D 3134-97 . ASTM, 2005. Standard practice for calculation of color tolerances and color differences from instrumentally measured color coordinates. ASTM International D 2244-05 . ASTM, 2014. Standard practice for specifying color by the munsell system. ASTM International D 1535-14 . Aydemir, S., Keskin, S., Drees, L., 2004. Quantification of soil features using digital image processing (dip) techniques. Geoderma 119, 1–8. Centore, P., 2012. An open-source inversion algorithm for the munsell renotation. Color Research & Application 37, 455–464. Chenoweth, J.M., Farahani, A., 2015. Color in historical ceramic typologies: A test case in statistical analysis of replicable measurements. Journal of Archaeological Science: Reports 4, 310–319. CIE, 2004. Colorimetry, Publication 15:2004. 3rd ed., Bureau, C.C. CIE, 2006. ISO 11664-2:2007 (CIE S 014-2/E:2006) Colorimetry – Part 2: CIE standard illuminants. URL: http://www.cie.co.at/publications/ colorimetry-part-2-cie-standard-illuminants-colorimetry. Conway, B., Livingstone, M., 2005. A different point of hue. Proceedings of the National Academy of Sciences of the United States of America 102, 10761–10762. Ferguson, J., 2014. Munsell notations and color names: Recommendations for archaeological practice. Journal of Field Archaeology 39, 327–335. Finlayson, G., Hordley, S., Schaefer, G., Tian, G.Y., 2005. Illuminant and device invariant colour using histogram equalisation. Pattern recognition 38, 179–190. Frankel, D., 1980. Munsell colour notation in ceramic description: an experiment. Australian archaeology , 33–37. Gerharz, R.R., Lantermann, R., Spennemann, D.R., 1988. Munsell color charts: a necessity for archaeologists? The Australian Journal of Historical Archaeology , 88–95. Godlove, I., 1951. Improved color-difference formula, with applications to the perceptibility and acceptability of fadings. JOSA 41, 760–772. G´omez-Robledo, L., L´opez-Ruiz, N., Melgosa, M., Palma, A.J., Capit´anVallvey, L.F., S´anchez-Mara˜no´ n, M., 2013. Using the mobile phone as munsell soil-colour sensor: An experiment under controlled illumination conditions. Computers and electronics in agriculture 99, 200–208. Goodwin, C., 2000. Practices of color classification. Mind, culture, and activity 7, 19–36. Gueli, A.M., Pedull`a, E., Pasquale, S., La Rosa, G.R., Rapisarda, E., 2017. Color specification of two new resin composites and influence of stratification on their chromatic perception. Color Research & Application 42, 684–692. Han, P., Dong, D., Zhao, X., Jiao, L., Lang, Y., 2016. A smartphone-based soil color sensor: For soil type classification. Computers and Electronics in Agriculture 123, 232–241. Jones, A., MacGregor, G., 2002. Colouring the past: the significance of colour in archaeological research. Berg. Klein, G.A., Meyrath, T., 2010. Industrial color physics. volume 7. Springer. Koh, B.K., 2007. Color difference and acrylamide content of cooked food. American Journal of Food Technology 2, 318–322. Kuehni, R.G., 2002. The early development of the munsell system. Color Research & Application: Endorsed by Inter-Society Color Council, The Colour Group (Great Britain), Canadian Society for Color, Color Science Association of Japan, Dutch Society for the Study of Color, The Swedish Colour Centre Foundation, Colour Society of Australia, Centre Franc¸ais de la Couleur 27, 20–27. MacAdam, D., 1935. The theory of the maximum visual efficiency of colored materials. JOSA 25, 249–252. Milotta, F.L., Stanco, F., Tanasi, D., 2017. Arca (automatic recognition of color for archaeology): A desktop application for munsell estimation, in: International Conference on Image Analysis and Processing, Springer. pp. 661–671.

8 Milotta, F.L., Stanco, F., Tanasi, D., Gueli, A.M., 2018a. Munsell color specification using arca (automatic recognition of color for archaeology). Journal on Computing and Cultural Heritage (JOCCH) 11, 17. Milotta, F.L.M., Quattrocchi, C., Stanco, F., Tanasi, D., Pasquale, S., Gueli, A.M., 2018b. Arca 2.0: Automatic recognition of color for archaeology through a web-application, in: IEEE International Conference Metrology for Archaeology and Cultural Heritage, IEEE. pp. 461–465. Milotta, F.L.M., Tanasi, D., Stanco, F., Pasquale, S., Stella, G., Gueli, A.M., 2018c. Automatic color classification via munsell system for archaeology. Color Research & Application 43, 929–938. Munsell, A., 1915. Atlas of the Munsell color system. Wadsworth, Howland & Company, Incorporated, Printers. O’Donnell, T.K., Goyne, K.W., Miles, R.J., Baffaut, C., Anderson, S.H., Sudduth, K.A., 2011. Determination of representative elementary areas for soil redoximorphic features identified by digital image processing. Geoderma 161, 138–146. Oleari, C., 2016. Standard colorimetry: definitions, algorithms and software. John Wiley & Sons. Pegalajar, M., Ruiz, L., S´anchez-Mara˜no´ n, M., Mansilla, L., 2019. A munsell colour-based approach for soil classification using fuzzy logic and artificial neural networks. Fuzzy Sets and Systems . Rodrigues, A., 2004. Color technology and paint, in: Association Internationale de la Couleur (AIC) Color and Paints, Interim Meeting of the International Color Association Proceedings, pp. 103–108. Rossel, R.V., Fouad, Y., Walter, C., 2008. Using a digital camera to measure soil organic carbon and iron contents. Biosystems Engineering 100, 149–159. Rossel, R.V., Minasny, B., Roudier, P., McBratney, A., 2006. Colour space models for soil science. Geoderma 133, 320–337. Ruck, L., Brown, C.T., 2015. Quantitative analysis of munsell color data from archeological ceramics. Journal of Archaeological Science: Reports 3, 549– 557. S´anchez-Mara˜no´ n, M., Soriano, M., Melgosa, M., Delgado, G., Delgado, R., 2004. Quantifying the effects of aggregation, particle size and components on the colour of mediterranean soils. European Journal of Soil Science 55, 551–565. URL: http://dx.doi.org/10.1111/j.1365-2389.2004. 00624.x, doi:10.1111/j.1365-2389.2004.00624.x. Stanco, F., Battiato, S., Gallo, G., 2011a. Digital imaging for cultural heritage preservation: Analysis, restoration, and reconstruction of ancient artworks. CRC Press. Stanco, F., Gueli, A.M., 2013. Computer graphics solutions for pottery colors specification, in: Digital Photography IX, International Society for Optics and Photonics. p. 86600S. Stanco, F., Tanasi, D., Bruna, A., Maugeri, V., 2011b. Automatic color detection of archaeological pottery with munsell system, in: International Conference on Image Analysis and Processing, Springer. pp. 337–346. Stanco, F., Tanasi, D., Gueli, A.M., Stella, G., 2012. Computer graphics solutions for dealing with colors in archaeology, in: Conference on Colour in Graphics, Imaging, and Vision, Society for Imaging Science and Technology. pp. 97–101. USA, S.S.S., 1975. Soil Taxonomy: A basic system of soil classification for making and interpreting soil surveys. US Government Printing Office. Zacharias, N., 2018. Critical assessment of chromatic index in archaeological ceramics by munsell and rgb: novel contribution to characterization and provenance studies. Mediterranean Archaeology and Archaeometry 18, 175–212.

AUTHOR DECLARATION We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome. We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us. We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that we have followed the regulations of our institutions concerning intellectual property. We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct communications with the office). He/she is responsible for communicating with the other authors about progress, submissions of revisions and final approval of proofs. We confirm that we have provided a current, correct email address which is accessible by the Corresponding Author and which has been configured to accept email from.

______________________________________________ / 15 JUL 19 Signed by the corresponding author (Filippo LM Milotta)

Information about the manuscript and authors: Title of the Manuscript: Challenges in automatic Munsell color profiling for cultural heritage Authors full names: Filippo Luigi Maria Milotta, Giuseppe Furnari, Camillo Quattrocchi, Stefania Pasquale, Dario Allegra, Anna Maria Gueli, Filippo Stanco, Davide Tanasi Corresponding author: Filippo L.M. Milotta Corresponding author email: [email protected]