Using patch-based image synthesis to measure perceptual texture similarity

Using patch-based image synthesis to measure perceptual texture similarity

Using patch-based image synthesis to measure perceptual texture similarity Accepted Manuscript Using patch-based image synthesis to measure perceptu...

10MB Sizes 0 Downloads 88 Views

Using patch-based image synthesis to measure perceptual texture similarity

Accepted Manuscript

Using patch-based image synthesis to measure perceptual texture similarity Rodrigo Mart´ın, Min Xue, Reinhard Klein, Matthias B. Hullin, Michael Weinmann PII: DOI: Reference:

S0097-8493(19)30042-1 https://doi.org/10.1016/j.cag.2019.04.001 CAG 3052

To appear in:

Computers & Graphics

Received date: Revised date: Accepted date:

16 October 2018 1 March 2019 15 April 2019

Please cite this article as: Rodrigo Mart´ın, Min Xue, Reinhard Klein, Matthias B. Hullin, Michael Weinmann, Using patch-based image synthesis to measure perceptual texture similarity, Computers & Graphics (2019), doi: https://doi.org/10.1016/j.cag.2019.04.001

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT / Computers & Graphics (2019)

1

Highlights • Novel strategy for the measurement of mutual perceptual texture similarity • Generation of continuous transitions between textures via texture interpolation

• Localization uncertainty is related with the perceived pairwise texture similarity

AC

CE

PT

ED

M

AN US

• Suitability for addressing fine-grained similarities

CR IP T

• User-based localization of intermediate interpolated specimen in the continuum

ACCEPTED MANUSCRIPT 2

/ Computers & Graphics (2019)

Graphical Abstract

0y6

207 180 106

0y2

209

58

60

0

95 61

p0y2

67

181

43

215 183 79 55 184 RoughCpCSmooth 74

LowCcontrastCpC HighCcontrast

110

64

NonCrepetitiveCpC Repetitive

109

177

182

NonCdirectionalCpC LowCcomplexyCpC Directional HighCcomplexy NonCorientedCpC NonCgranularCpC LocallyCoriented Granular LowCfeatyCdensityCp HighCfeatyCdensity

62 57 NonpuniformCpCUniform 28 IrregularCpCRegular

48

CoarseCpCFine

75

CR IP T

DimensionCS2

0y4

NonprandomCpCRandom

205

34

p0y4 85

p0y6

Texture Feature

p0y6

p0y4

p0y2

0

0y2

0y4

0y6

AC

CE

PT

ED

M

AN US

DimensionCS1

ACCEPTED MANUSCRIPT Computers & Graphics (2019)

Contents lists available at ScienceDirect

Computers & Graphics journal homepage: www.elsevier.com/locate/cag

CR IP T

Using patch-based image synthesis to measure perceptual texture similarity Rodrigo Mart´ına,∗, Min Xuea , Reinhard Kleina , Matthias B. Hullina , Michael Weinmanna a Institute

of Computer Science II, University of Bonn, Endenicher Allee 19a, 53115 Bonn, Germany

ABSTRACT

Article history: Received April 30, 2019

The perceptual similarity of textures has gained considerable attention in the computer vision and graphics communities. Here, we focus on the challenging task of estimating the mutual perceptual similarity between two textures from materials on a consistent scale. Unlike previous studies that more or less directly queried pairwise similarity from human subjects, we propose an indirect approach that is inspired by the notion of just-noticeable differences (JND). Similar metrics are common in imaging and color science, but so far have not been directly transferred to textures, since they require the generation of intermediate stimuli. Using patch-based statistical texture synthesis, we produce continuous transitions between pairs of textures. In a user experiment, participants are then asked to locate an interpolated specimen in the linear continuum. Our intuition is that the JND, defined as the uncertainty with which participants perform this task, is closely related with the perceived pairwise texture similarity. Using a dataset of fabric textures, we show that this metric is particularly suitable to address fine-grained similarities, produces approximately interval-scale measurements and is additionally convenient for crowdsourcing. We further validate our approach by comparing it with a well-established data collection technique using the same dataset. c 2019 Elsevier B.V. All rights reserved.

1. Introduction

CE

PT

ED

M

Keywords: Texture similarity, Texture synthesis, Visual perception, Digital material appearance

AN US

ARTICLE INFO

AC

The perception of textures is an essential field of research that has a broad range of applications in numerous vision problems. While textures play a decisive role in the visual appearance of materials, they also serve as a cue in many understanding tasks such as judging material attributes (e.g. [1]), material recognition and classification (e.g. [2, 3, 4]) and object detection (e.g. [5]). The main challenge when dealing with textures is that their effect on human perception is very complex and highly subjective. In this paper, we focus on the problem of perceived texture similarity. Questions like “are textures A and B more similar than textures C and D”? arise in many applications, for instance when reasoning whether objects are made of the same or a similar

∗ Corresponding

author: e-mail: [email protected] (Rodrigo Mart´ın)

material or when searching materials that are perceptually inbetween certain other given samples. Reliable texture similarity metrics could greatly facilitate the development of efficient user interfaces for retailing web stores that are consistent with human perception, or could be used to guide design processes, find suitable replacements for unavailable materials, and so on. In color science, perceptually uniform color spaces such as CIELAB and CIELUV have reached a mature state and enabled highly efficient digital solutions for applications like gamut mapping for display and print [6]. Textures, however, introduce additional spatial structure and possibly semantic context, which makes any measure of similarity much less well-posed and objective. Pioneering research in perceptual texture similarity has aimed at identifying a meaningful, low-dimensional and perceptually uniform space for textures [7, 8], where the axes represent meaningful quantities and distances between points in this space are proportional to variations in the human perception. These studies and many who have built upon them rely on

ACCEPTED MANUSCRIPT Mart´ın et al. / Computers & Graphics (2019)

• To derive a perceptually linear interpolation between textures that is so far not guaranteed by current techniques, we present a linearization method based on user judgments. • From our procedure, a meaningful perceptual space for material textures is computed. We determine the soundness of our approach by comparing it with the perceptual space resulting from a free-grouping experiment. 2. Related work Establishing a texture similarity measure belongs to the essential aspects that need to be considered to understand the appearance of materials. Therefore, we review major research on the perception of digital material appearance as well as related work regarding the similarity of textures and the practical acquisition of psychophysical data. As our approach involves examplebased techniques for texture synthesis in order to generate the experimental stimuli, we also report the main developments in this regard.

AC

CE

PT

ED

M

AN US

free-grouping experiments to collect perceptual data. This input is later shaped into a similarity/distance matrix from which a perceptual texture space (PTS) is constructed by means of dimensionality reduction techniques. Although this approach has proven useful in the task of deriving different similarity metrics, the experimental procedure may become impractical for large datasets due to the substantial duration of the grouping process, which may lead to fatigue and boredom issues. In fact, arranging e.g. a large collection of printed images into clusters has been shown to be a time-consuming experiment [9], where only six out of thirty participants were able to complete the whole assignment. Moreover, this experimental design does not allow the addition of unseen data points to the resulting similarity matrix, whose sparsity increases along with the size of the dataset. Several valid alternatives have been proposed to collect similarity data while partially overcoming the previous issues e.g. progressive grouping, pairwise comparisons or forced-choice combinations. However, they depend on direct user queries for pairwise similarity between textures, a concept that is illdefined and depends on the individual subject experience, being therefore subject to bias. In this paper, we introduce a novel strategy for the measurement of mutual perceptual similarity between textures from materials. Our proposal offers precise control of the data collection for specific pairs while mitigating some of the obstacles from the previous techniques. Specifically, we rely on a localization task inspired by the notion of just-noticeable differences (JND) and employed a well-established technique in computer graphics, patch-based texture synthesis, to produce a gradual transition, or gradient, between two textures A and B. An additional intermediate texture exemplar C, generated using the same method, is then given to the user, along with the task to locate it on the continuum from A to B. The working hypothesis for our metric is that the more similar the two textures at the end of the gradient are, the less certain the user will be about the placement of C. For multiple users tasked with the same question, this will result in a variance which we expect to correlate with the perceived similarity between textures. As the texture interpolation itself is not driven by perceptual metrics, the synthesized transition between the textures may not represent a smooth, even gradient between materials. We therefore propose a preliminary study that allows the generation of perceptually linear interpolations. Our approach addresses the medium and high end of the similarity scale described by Zujovic et al. [10], is easily parallelizable among multiple subjects, extensible to additional samples and produces approximately interval-scale measurements. Thanks to these characteristics, the methodology is suited for implementation on human intelligence platforms like Amazon Mechanical Turk and results in a space of textures that complements previous investigations. Altogether, the main contributions of this paper are:

CR IP T

4

• The introduction of a novel formulation for the perceptual similarity between textures that exploits the noise inherent to a JND-like task. To generate the stimuli used for the experiments, we make use of state-of-the-art texture interpolation.

2.1. Perception of digital material appearance

Undoubtedly, the way we perceive our surroundings guides our interaction with objects. Even by solely considering the materials objects are made of, we are able to discriminate between the different material classes based on their visual similarity. As a consequence, there has been a considerable amount of work studying the fundamentals of the perception of material surfaces and their properties [11]. The perceived dimensionality of real-world materials has been also an important matter of research, often based on a set of relevant perceptual qualities [1, 12]. Some other investigations instead, focus on the perception of digital material representations and, more specifically, on the exploration of a perceptual space for material gloss [13] or the development of a control space for predictable editing of BRDF data [14]. However, despite the increasing reproduction quality of digital materials achieved by current computer graphics algorithms, these are still not capable of accurately preserving all the fine details that contribute to material appearance as well as simple photographs would do [15]. For this reason, many material catalogs and online fashion illustrate their collections using pictures, often offering close-up views of the texture features, and even physical samples [16] instead of their virtual equivalents, regardless of their potential benefits. 2.2. Perceptual texture similarity The visual perception of textures and their features has been intensively studied in the past [17]. Measuring texture similarity is a key aspect of texture perception which has gained particular attention in recent years due to its relevance for several operating domains. In Zujovic et al. [10], the authors describe the differences between quantitative and qualitative similarity and their applications. Quantitative metrics address the monotonic relationship between measured and perceived similarity and have implications in the domain of image compression and content-based image retrieval. In contrast, a valid qualitative

ACCEPTED MANUSCRIPT Mart´ın et al. / Computers & Graphics (2019)

As indicated earlier, the assessment of texture similarity generally involves a methodology which comprises the computation of a perceptual similarity matrix (PSM) derived from a freegrouping experiment. However, this procedure and the resulting perceptual similarities present several points in question. As the dataset grows larger, the user task may become rather impractical, since a considerable amount of textures has to be compared simultaneously. Indeed, it has been demonstrated that the selection of a data collection method is critical in data positioning studies. Concretely, methods which entail larger completion times significantly affect the fatigue, boredom, amount of missing values and the obtained perceptual map [28]. Besides, this setup cannot be parallelized among several subjects, does not support collecting similarities for previously unseen points and the derived PSM may become very sparse, due to the shrinking possibility of two samples being grouped together for increasing sizes of the dataset. Alternatively, Rogowitz et al. [8] analyzed two different psychophysical tasks. In the first, participants arrange the images on a table and measure the physical distances between samples to obtain a dense similarity matrix. In the second, a forced-choice task is presented to the users to select the most similar image to a reference in random batches of eight samples shown at a time. Although the obtained perceptual spaces resemble one another, the one from the forced-choice experiment exhibits a less-informative circular structure. Liu et al. [19] address the sparsity problem by dividing the database into smaller subsets to be grouped individually and later merged into larger clusters, until all groups are merged together. This procedure allows to assign an extra confidence score to each action based on the visual similarity between groups of textures. An additional progressive grouping assignment has been proposed by Zujovic et al. [10]. This method allows the parallelization of the data collection process across different subjects and minimizes the completion time, although still produces a sparse PSM which cannot be extended to new data points. Our proposed setup relies on pairwise texture similarities obtained from micro-tasks performed by participants on a crowdsourcing system, thus overcoming issues related with human factors, the parallelizability of the task and the extension of the database to unseen textures. Besides, our approach offers precise control of the data collection process and, therefore, the possibility of obtaining a dense similarity matrix. Other methods for data acquisition, like pairwise comparisons or triads, may be similarly used for crowdsourcing and grant control of the data collection for specific pairs. However, these techniques depend on subjective magnitude judgments which are highly prone to bias [8]. Instead, we propose to explore the noise inherent to a localization task as a mean to gather high-quality similarity assessments. A comparison between the predominant methods to collect similarity data is provided in Table 1. Including the complexity of evaluating the entire database, their capacity to be parallelized across users or extended to new points, the influence of human factors (fatigue and boredom) in experiments with potentially large databases, the sparsity of the computed PSM and the overall quality of the collected data based on the derived

AN US

One of the first efforts in this direction has been presented by Rao and Lohse [7], who identified some relevant perceptual dimensions for textures. Their research employs an experimental methodology for the acquisition of image similarity data from human annotations based on a free-grouping task, which has been followed to a greater or lesser extent in later studies [9, 18, 19, 8]. These investigations focus on the exploration of a low-dimensional space of textures and its relevant dimensions, while using different analysis techniques and texture databases composed either by natural images, procedural textures or rendered heightmaps. In an investigation by Balas [20], the same experimental principle has been applied to reformulate texture similarity as a categorization task in order to compare the efficiency of several parametric texture models. Alternatively, Gurnsey and Fleet [21] computed a perceptual space for noisy artificial textures using an experimental design based on triads.

2.3. Practical acquisition of psychophysical data

CR IP T

metric should distinguish between similar and dissimilar textures as required by wide-spread applications such as image classification and scene understanding. The strategy proposed in this paper belongs to the first category of metrics and addresses the computation of fine-grained distances between textures from different materials, that still share similar texture characteristics e.g. sample scale, spatial structure or semantic content.

5

AC

CE

PT

ED

M

Certainly, the evaluation of texture similarity would profit from more computational approaches. A comprehensive review of the existing structural texture similarity metrics that integrate insights from human perception in order to grant point-by-point differences in textured regions is provided by Pappas et al. [22]. On the other hand, many different computational texture features for image classification have demonstrated good performance on texture databases (e.g. Gabor Filters [23], Local Binary Patterns [24] and Filter Banks [25]), although they do not always correlate well with observations made by humans [9], especially for image pairs that appear dissimilar to the observers. Furthermore, Dong et al. [26] investigate the capability of a broad set of computational features to estimate texture similarity in comparison to perceptual rankings in a unified framework. Despite obtaining interesting insights on which features correlate better with perceptual data, none of the employed traits was able to ideally replicate human performance. In addition, hand-crafted features and deep image features from convolutional neural networks (CNN) were used to train regression models against perceptual data from free-grouping experiments [27]. Their results indicate that a combination of deep features correlates slightly better with user ratings. Motivated by the intuition that visual traits influence human perception of material similarity, Schwartz and Nishino [2] address material recognition by exploiting human perception of visual similarity. In their research, ground-truth perceptual information is gathered via crowdsourcing where small image patches from natural images are shown to users who rate whether they look similar or not. Material distances are then computed in terms of the average probability of similarity to the different categories. With our approach, we avoid directly querying annotators for texture similarity and allow them to perform the ratings on a consistent scale.

ACCEPTED MANUSCRIPT 6

Mart´ın et al. / Computers & Graphics (2019)

Table 1: Benefits and drawbacks of different data collection methods. This comparison is conducted considering the complexity of evaluating the entire database, their ability to be parallelized across users or to be extended to unseen points, the influence of fatigue and boredom (human factors) in potentially large databases, the sparsity of the derived matrix and the general quality of the collected data based on the resulting PTS.

Free-grouping [20, 28, 9, 18, 19, 7, 8] Progressive grouping [10] Conditional ranking [28] Pairwise comparisons [28] Triadic combinations [28, 21] Multiple forced-choice [8, 2] Our method ✓ Good performance

Task complexity

Parallelizable

Human factors

Extensible

Matrix sparsity

Data quality

O(n log(n)) O(n log(n) − nlog(m)) O(n2 ) O(n2 ) O(n3 ) O(kn) O(n2 )

no yes yes yes yes yes yes

✗✗ ✓ ✗ ✓✓ ✓✓ ✓ ✓✓

no no no yes yes no yes

✗ ✗ ✓ ✓ ✓ ✗ ✓

✓ ✓ ✗ ✓

✗ Poor performance

- Not enough evidence

PTS. 2.4. Example-based texture synthesis

m = size of the subset

k = stimuli presented at a time

trade-off between time performance and visual quality. Nevertheless, our methodology for computing texture similarities is not constrained to the utilization of one particular synthesis algorithm.

AC

CE

PT

ED

M

AN US

Beyond the analysis of texture perception and the features that contribute to this task, several investigations approach the generation of synthetic textures while trying to preserve the characteristics of given real-world samples. This allows producing texture variations that depict the same material as well as creating larger exemplars. While a comprehensive survey on this research domain is provided by Wei et al. [29], which covers a broad set of synthesis techniques from simple pixel-based to the more complex patch-based synthesis [30, 31], the recently emerging developments in deep learning can also be utilized for texture synthesis [32]. In addition to producing data-driven variations or bigger exemplars, texture synthesis can also be applied to generate a transition region between two source images, such that inconsistencies in color, texture and structure change gradually from one source to the other. In this regard, Ruiters et al. [33] combined patch-based and statistical approaches to produce high-quality interpolated texture patches between two sources, however, at rather long computation times. Furthermore, their method requires a certain degree of user input, as the synthesis is guided by manually-generated feature maps. Later, Darabi et al. [34] developed a technique called Image Melding, which combines the potential of patch-based and gradient-domain approaches. This procedure builds on a modified version of the PatchMatch algorithm [35] and computes nearest neighbor patch correspondences to successfully interpolate between textures with rather different structures, without sacrificing texture sharpness during the process. The procedure is fully automatic and produces convincing sequences with an acceptable time performance (in the order of minutes). More recently, deep learning has been applied for texture synthesis and interpolation [36, 37]. However, the interpolations are performed only for relatively small image patches and the results tend to be noisy due to artifacts produced by the neural networks. In general, despite of the high degree of quality achieved by current methods, none of the existing solutions is capable of solving the texture synthesis problem for all kind of image property variations (e.g. scale, stochasticity, stationarity, etc.) [38]. In order to generate the stimuli for our work, we applied the Image Melding method (IM) [34], mainly due to its excellent

CR IP T

Approach

3. Stimuli

Before providing a detailed discussion regarding the perceptual similarity measure in the next section, we introduce the involved image stimuli obtained from selecting a suitable texture dataset and a few pre-processing operations. The diagram for the generation of the stimuli and its embedding into the overall methodology is illustrated in Figure 1.

3.1. Selection of texture database For the present study, we focused mainly on fabric materials, since they share certain textural features (e.g. scale, structure or semantic content) while exhibiting a considerably diverse range of appearances. Besides, they have been largely used in previous research and are of great importance in many industrial applications, with a particular interest in the domain of e-commerce of textile materials for the world of fashion. In order to gather such a collection of respective fabric textures, we inspected existing material datasets, which were extensively reviewed in the survey provided by Hossain and Serikawa [39]. We then discarded collections of textures captured under uncontrolled lighting or viewing conditions (e.g. Brodatz [40], MeasTex [41], UMD database [42]), as it has been demonstrated that changes in illumination drastically affect the appearance of textures and materials [43], and would significantly bias the users’ judgments. Likewise, databases in which samples were not annotated or which contained too few fabric-like images (e.g. CUReT [44], PhoTex [45], PerTex [18]) were equally discarded. As a result, in the scope of our investigation we make use of a subset of textures from the Amsterdam Library of Textures (ALOT) database [46], which fulfills the mentioned prerequisites. This collection consists of 250 annotated images from materials that have been captured under six different illumination conditions, four viewpoints and four camera orientations plus one additional image for each camera at a reddish spectrum. From the original annotations, we identified 24 materials coming from fabric/textile samples and selected one single color image per material, taken with a camera oriented perpendicularly to the

ACCEPTED MANUSCRIPT Mart´ın et al. / Computers & Graphics (2019)

7

1) Stimuli generation

ALOT subset Cropping Perceptually linear interpolation

Re-synthesis

'Select most linear transition'

Color removal

2) Perceptual linearization of texture interpolation

F=

D=

0

dji

dij

0

MDS

Correlation

4) Perception of texture features

'Rate a set of textural properties'

Perceptual Texture Space (PTS)

AN US

textures

features

CR IP T

'Locate in-between patches in the texture transition'

Interpolation and color removal

3) Formulation of perceptual similarity based on JND

ED

M

Fig. 1: Overview of our approach for the derivation of a perceptual texture similarity metric: 1) In an initial step, stimuli are generated by cropping and resynthesizing the images and removing the color information. In addition to obtaining a set of single gray-scale images (I), continuous interpolation strips between pairs of textures are synthesized. 2) As the computed sequences are only linear w.r.t. computational features, we perform an additional perceptual linearization by asking subjects to select the most linear interpolation among a set obtained via different power functions. From the resulting linearized strips (Tlinear ), interpolants are synthesized by sampling at pre-defined locations along the sequence. 3) By letting people locate the generated interpolants on the correspondent sequence and analyzing the variance in the obtained ratings, we derive a distance matrix DI containing the perceptual distances between pairs of textures (Ii , I j ). Multidimensional scaling (MDS) was then used to compute a perceptual space of textures (PTS). 4) Finally, we evaluated the correspondence between the dimensions of our PTS and user judgments for a set of textural features on a Likert scale FI .

AC

CE

PT

sample surface (condition C1 in ALOT) under illumination by all hemi-spherically distributed light sources involved in the setup (condition L8 in ALOT). Furthermore, we expanded this selection with 6 supplementary samples from diverse material categories (paper, man-made, foliage, food), in order to study their behavior in the derived space of textures. This subset, at the lowest resolution provided by the authors (384 × 256 pixels) for presentation convenience, served as the initial input for our experiments. 3.2. Selection of texture synthesis method Generating synthetic textures that are visually similar to their real-world counterparts, i.e. preserving the underlying texture characteristics that determine the overall appearance, has been approached with various strategies [29, 38]. The suitability of a synthesis technique is given in terms of the visual quality of the produced textures, the respective computation times as well as its applicability for texture interpolation between two different sources i.e. is capable to produce smooth transitions from one texture to another?. As a trade-off between these three specifications, we consider the Image Melding (IM) method [34] to be a valid choice for scalable texture synthesis, representing also a state-of-the-art technique for the generation of interpolated sequences between

two given textures1 . In the scope of the supplemental material, we provide a more detailed comparison of the performances of different texture synthesis techniques as well as a comparison of our approach to a previous patch-based texture interpolation approach [33]. 3.3. Stimuli generation Our experimental studies rely on measuring the accuracy with which users are able to locate intermediate patches in an interpolated continuum between two samples. For the purpose of generating these sequences, we exploited the Image Melding algorithm. It is to be expected that a higher diversity in spatial structure between the images would lead to a wider search space for the PatchMatch method integrated in IM and, consequently, the interpolation process would produce less deterministic and more inconsistent results. For this reason, we reduced the structural variability of the initial images by using IM to re-synthesize a cropped, normalized patch (128 × 128 pixels) from the original image into a larger version with a resolution of 256 × 256 pixels. Additionally, some of the cropped patches were manually rotated to align dominant structures in the respective textures

1 Source

code has been published by the authors at http://www.ece.ucsb. edu/~psen/melding

ACCEPTED MANUSCRIPT Mart´ın et al. / Computers & Graphics (2019)

Table 2: The main parameter values from the Image Melding algorithm for the re-synthesis and the interpolation steps, respectively.

Parameter

Re-synthesis

Interpolation

Patch size (in pixels) Gradient weight Random search Uniform scale (range) Relative scale (range) Rotation (range) Normalized weight Interpolate gradients

10 0.1 off [0.9, 1.2] [0.9, 1.1] [−π/4, π/4] off –

10 2 off [0.9, 1.3] [0.9, 1.1] [−π/4, π/4] off on

tional perceptual linearization of these interpolations is required (see Section 4.1). 4. Measuring perceptual similarity

4.1. Perceptual linearization of texture interpolation Originally, the Image Melding algorithm requires the patches in the interpolated image T i j to be similar to both source images Ii and I j , where the relative contribution of each source is modulated by a linear numerical interpolation. In order to achieve a more evenly distributed transition between the sources in accordance with human perception, we propose to guide the synthesis process with the following bijective power function instead:    if α ≥ 0  x(α+1) g(x) =  , (1)  (1−α) 1 − (1 − x) else

AC

CE

PT

ED

M

AN US

accordingly, in order to later facilitate the interpolation process. The steps for the re-synthesis process are illustrated in Figure 1. Like most of the synthesis algorithms, the computational performance and visual quality of IM depends on a plethora of involved parameters that have to be tuned up manually beforehand to achieve the best possible results. To allow a fair comparison of the resulting images, we include the most important IM parameter values employed to generate our stimuli in the middle column of Table 2. For further information on this set of parameters, we refer the reader to the original publication [34]. Color has been documented to be one of the most salient visual cues when judging image similarity [8]. In general, human subjects tend to differently balance color composition and structure when assessing texture similarity and, as a consequence, more effective metrics can be developed when color and structure are decoupled [10]. Given that this study particularly focuses on the structural similarities between textures, we followed the previous insight and performed the entirety of our experiments in luminance-only images, in accordance with related investigations [20, 9, 19, 7, 10]. Thereby, we ensure that the obtained perceptual distances are driven by textural features instead of merely color. However, for the stimuli generation step itself, we used full-color images as the input to the synthesis algorithm. Figure 2 displays the complete subset of 30 textures selected from the ALOT database as obtained after the processing steps (normalization, cropping, re-synthesis and color-removal). Using this image collection I, we applied again IM synthesis to compute all possible interpolations T i j ∈ T between image pairs (Ii , I j ) with Ii , I j ∈ I and i , j. The method is capable of searching for similar patches in both input images to produce a rendered strip presenting a smooth transition between the original textures. However, this transition is computed based on the linear interpolation of computational features, which differs from the perceptually linear interpolation required for the estimation of a perceptual distance metric. In other words, the interpolations do not necessarily represent a smooth, pleasant, evenly distributed transition between two textures to human observers (see Figure 6). The latter can be particularly observed for exemplars with features of different sizes or large variations in the magnitudes of the vertical and horizontal image gradients, on which IM relies to ensure the sharpness of the final sequences. Consequently, the interpolations do not describe a quantitative measurement scale and cannot be directly used as stimuli for the derivation of a perceptual similarity measure. Instead, an addi-

In this section, we detail our approach for the derivation of a perceptual texture similarity measure. As illustrated in Figure 1, the initial generation of input stimuli is followed by the computation of perceptually linearized transition sequences between the different textures. Then, the analysis of the user variance in the localization of patches in-between the interpolation sequences allows the derivation of a matrix DI that contains the perceptual distances between pairs of textures (Ii , I j ) in the dataset. We further examine human ratings regarding concrete texture features and their connection with the main dimensions of the texture space. In the following, we describe these steps in more detail.

CR IP T

8

To determine the values of α that produce perceptually linear transitions, we designed a preliminary psychophysical experiment to compute this mapping for each texture combination. To that end, we generated a set of interpolated stimuli by fixing the values of α to α ∈ {−10, −5, −2.5, −1, −0.5, −0.25, −0.1, 0, 0.1, 0.25, 0.5, 1, 2.5, 5, 10}, resulting in a total of 435×15 = 6,525 image strips T i j (α) ∈ Text , each having a resolution of 1,152 × 256 pixels. The values of the synthesis parameters were kept during all calculations, yet they differ slightly from those in the re-synthesis step (see Table 2, right column). To obtain the user judgments, we developed a custom website with an interface in which all 15 interpolations for a concrete texture sequence T i j were presented to the users (one-at-a-time), together with a slider widget and text with detailed instructions (see Figure 3). By moving the slider, the user was able to select the value of α for a concrete texture pair, and ultimately control which particular stimulus T i j (α) was displayed on the screen at a certain moment. The instructions encouraged the users to interact with this widget in order to “find the best slider position where the transition becomes as smooth (pleasant, even, linear) as possible”. Each of these elections was considered to be a psychophysical assignment or task. We then arranged a psychophysical pipeline in which the study was divided into several micro-batches consisting of 20 assignments each, where 10 of them consisted of the same stimuli reversed horizontally. Besides, 5 additional assignments were included for training purposes and to ensure the validity of the

ACCEPTED MANUSCRIPT Mart´ın et al. / Computers & Graphics (2019)

9

43

48

55

58

60

61

64

67

74

79

95

106

109

177

180

181

182

183

184

205

28

57

62

CR IP T

34

AN US

207

75

85

209

215

110

Fig. 2: The collection of textures utilized in our experiments (I), after several processing steps (normalization, cropping, re-synthesis and color-removal). Textures from non-fabric materials are presented in the lower row. The superimposed numbers correspond to the original material indexes in ALOT database [46].

ED

M

participants’ answers (see Section 4.2). As crowdsourcing using Amazon Mechanical Turk (AMT) has been demonstrated to be a convenient practice to collect perceptual data, we integrated our pipeline into AMT, and every individual micro-batch was presented to groups of between 12 and 15 Amazon workers. In total, 643 workers participated in this initial experiment. Next, we explain the pre-processing and verification of the answers given by the participants.

AC

CE

PT

4.2. Data verification During early trials with AMT, we observed that an important amount of workers did not try to perform the task correctly or did not understand the instructions adequately. As reported by earlier investigations in the vision community, these sort of problems are not unfamiliar to crowdsourcing platforms due to the limited control over the potential subjects. For instance, Bell et al. [47] devoted quite some effort to automatically identify and discard 31% of the 1,381 workers performing a brightness localization task. In a similar manner, we incorporated several tools into our pipeline to facilitate the realization of the assignments, to automatically detect ineligible workers and, in short, to ensure the quality of the resulting data. These tools, are listed next: 1. A training step prior to the actual experimental procedure. Here, the same interface with more detailed instructions and convenient feedback indicating the correctness of the answer was presented to help the realization of the two initial assignments. Although the precise α-value for this stimuli was unknown, a ±2 confidence interval around a plausible exponent αi was established. Therefore, for each trial 5 out of 15 exponent values were considered a valid answer. The system would not allow participants to proceed

2.

3.

4.

5.

forward until the given answer matches the selected interval. For this step, the pairs (I48 , I181 ) and (I48 , I177 ) were employed as the respective input textures offered diverse characteristics that simplified this training phase. These assignments were kept constant for all the batches. A testing step for the three assignments subsequent to the training. These assignments were also fixed for all the batches. In this case, we employed again the combination (I48 , I177 ) in addition to (I61 , I180 ) and (I34 , I67 ). The same ±2 confidence interval around a reasonable answer was set. However, this time we rejected the workers whose selection of α lied outside that interval. The discarded workers were allowed to repeat the task again from the beginning. Examination of the worker’s Root Mean Square (RMS) error between the ratings for the original and horizontallyreversed images. If the RMS exhibited is larger than a generous threshold, the worker’s results were excluded from further analysis. From this step on, the respective thresholds were not fixed to a specific value as they largely depended on the specific texture interpolations considered in each batch. Inspection of the mean elapsed time per worker and assignment. The answers from workers who present significantly lower values that the overall mean time in the same batch were equally removed from the posterior analysis. Analysis of the inter-participant correlation coefficients. Workers whose coefficient is significantly lower than the overall mean in the same batch were considered to be outliers, and their results were excluded from further analysis.

Indeed, we observed a high degree of correlation between the values for the RMS between similar stimuli, mean elapsed time

ACCEPTED MANUSCRIPT 10

Mart´ın et al. / Computers & Graphics (2019)

CR IP T

Fig. 3: Experimental interface for the perceptual linearization of texture interpolations. The users were asked to find the best slider position where the transition becomes as smooth, linear and evenly distributed as possible.

texture combination (I95 , I209 ). The final texture synthesis will then be guided by the numerical interpolation resulting from g(x) where the value of α is the average of the users’ judgments, for each T i j ∈ Text . A comparison between the newly generated, perceptually linear interpolations and the original ones is illustrated in Figure 6 for two example combinations. As it can be observed, the new strips present a more evenly-distributed transition between the two texture sources. In the scope of our evaluation, we demonstrate that these transitions do not only appear visually linear, but also represent an almost accurate interval scale which is well-suited for perceptual measurements. Therefore, we can use the generated strips as input stimuli for our main experimental setup as described in the following section. 4.3. Formulation of texture similarity based on just-noticeable differences

AN US

The main purpose of this investigation is the measurement of perceptual similarities between pairs of textures. The method should allow precise control of the gathered data and mitigate issues concerning the complexity, parallelizability and time duration of the perceptual assignment, the sparsity of the PSM and its extension to unseen samples. With this in mind, we lean on the concept of JND, which has been extensively used in the domain of experimental psychology. The fundamental principle is the measurement of how much we need to change a stimulus along a given dimension before people notice the change [48]. To exploit this notion in our investigation, we analyze the accuracy with which users are able to locate intermediate patches in a linearlychanging continuum, obtained from the preceding perceptual linearization of interpolated texture sequences. The underlying intuition is that the users’ precision when performing this task correlates with the pairwise texture similarity. More concretely, if the user responses for a particular pair show a low accuracy, the textures at the end of the gradients are more likely to be perceptually similar, as the interpolants cannot be distinguished. Contrarily, a high accuracy would indicate that the textures are perceived more dissimilar, as the intermediate patches can be located precisely. This way, we avoid direct ratings of the similarity or dissimilarity between textures, which is an ill-defined task that depends on subjective user characteristics. While the preceding perceptual linearization allows us to produce visually smooth transitions between pairs of material textures, we additionally generated intermediate interpolating patches (resolution of 256 × 256 pixels) by synthesizing fixed intervals of the interpolation g(x) where x ∈ {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}. An exemplar of these patches T i j (x) for a concrete texture pair (Ii , I j ) together with the corresponding linearized strip T i j ∈ Tlinear is depicted in Figure 7. With this experimental setup, we are able to parallelize the similarity judgment task into a collection of micro-batches to be integrated in human intelligence platforms (AMT in our case). Each of this batches consisted of 40 assignments (including 6 repetitions to test user consistency) in which a randomized reference image (the interpolation sequence) was displayed, together with one of the corresponding input textures (an intermediate patch), whose value for x was also randomized. Furthermore, we allowed reference images to be shown horizontally reversed

PT

ED

M

Fig. 4: Experimental interface for the formulation of perceptual similarity based on JND. The users had to localize the region in the interpolation strip shown above that best corresponds to the image below.

CE

Fig. 5: Experimental interface for the judgment of texture features. The users rated a set of texture qualities on a bipolar Likert scale.

AC

and inter-participant correlation. This fact greatly facilitated the automatic detection and removal of ratings coming from ineligible workers. In general, we preferred discarding doubtful candidates and collecting additional ratings and we set the thresholds accordingly, in favor of the efficiency and reliability of the final data. Altogether, the ratings provided by 88 workers were not considered for further analysis, implying that we discarded 13.68% of the collected data, of which roughly a third were related to workers that submitted default answers (i.e. went through the stimuli leaving the slider in the default position). This user practice was also reported by Bell et al. [47]. From the remaining data, we obtained a total of 555 × 20 = 11,100 answers, 25.51 on average per texture combination T i j . The average standard error (SE) between participants for the mapped slider value of α across all texture combinations was c = 0.398, where the largest value (1.089) was found for the SE

ACCEPTED MANUSCRIPT Mart´ın et al. / Computers & Graphics (2019) 1

I61-I67 original

I109-I48 original

1

f(x)

f(x)

0.5

0.5

α=0

α=0

0 0 1

11

0

1

I109-I48 linearized

1

I61-I67 linearized

0 1

α = 3.05 0.5

g(x)

0.5

g(x)

α = -1.78 0

0

0

1

1

0

x=0.0

x=0.1

x=0.2

x=0.3

x=0.4

AN US

CR IP T

Fig. 6: Two example interpolated strips (I109 , I48 ) on the left and (I61 , I67 ) on the right, displaying the original transitions (top) and the linearized transitions (bottom). The adjacent plots depict the numerical interpolation guiding the synthesis, f (x) (original) and g(x) (linearized). Note how the in the images above the texture sources do not contribute equally to the final image, while the images below present a evenly distributed contribution to the final interpolation.

x=0.5

x=0.6

x=0.7

x=0.8

x=0.9

x=1.0

Fig. 7: An example reference transition between two textures (I79 , I182 ) employed in the main experimental setup (top row) as well as the intermediate interpolating patches to be located by the user in the reference image (bottom row).

AC

CE

PT

ED

M

during the assignments. Through additional explanatory instructions, the workers were asked to “locate the region within the interpolation sequence shown above that best corresponds to the image shown below”. Figure 4 shows the experimental interface for a concrete stimuli assortment. A total of 1,183 workers participated in this experiment. Similarly to the previous study, we incorporated several mechanisms in our pipeline to facilitate the completion of the exercise and, ultimately, to automatically discard ineligible workers. We kept the same structure of 2+3 testing assignments and used the same texture pairs during the data verification process (see Section 4.2), while the confidence interval was set to a horizontal range of 400 pixels around the correct location of the reference patches in the continuum. In this occasion, however, we were not able to use the inter-participant correlation, as the stimuli presented to the workers were fully randomized. With this process, we detected 58 unsuitable workers whose results were not considered in the posterior analysis, which corresponds to discarding only 4.9% of the participants. From the rest of the workers, we gathered 1,125 × 40 = 45,000 user clicks, 103.45 on average per texture combination. For each T i j , the uncertainty of the participants was measured by computing the Root Mean Square (RMS) of the differences between the user clicks and the centroids calculated for each set of clicks Ci j (x), where x is fixed to the values given earlier. Figure 8 exemplifies this calculation for two texture pairs with large and small perceived distances respectively. The normalized RMS values shaped our perceptual similarity matrix S I , where si j describes the perceived similarity between textures Ii and I j . Moreover, we introduced a user satisfaction survey at the end of the study that requested participants to rate, on a Likert scale from 1 to 7, whether the task was understandable, whether it

was easy to accomplish, if they enjoyed the realization of the experiment and the fairness of the payment. The average answers to this questionnaire showed that the exercise was well understood (6.27), relatively easy to accomplish (5.54), quite enjoyable (5.66) and also adequately paid (4.48). The vast majority of positive comments left by the workers confirmed the good reception of our experimental setup and did not support possible evidences of disinterest, fatigue or related issues. From the derived similarity matrix and its corresponding distance matrix DI = 1/S I , we can compute a low-dimensional texture space by applying dimensionality reduction techniques such as multidimensional scaling (MDS). To also evaluate the degree of correlation between the main dimensions of this texture space and relevant textural features, we perform a respective analysis following the descriptions in the next section. 4.4. Perception of texture features Whereas it is very unlikely that a meaningful set of texture features would apply to all kind of surfaces [49], the correlation between the main dimensions of the derived texture space and well-known textural features would still provide interesting insights regarding the information contained in our PTS. For this purpose, we collected user ratings regarding a group of perceptual feature scales for each of the texture samples in I. In this regard, we employed the set of 12 perceptual qualities defined by Rao and Lohse [7] (contrast, repetition, granularity, randomness, roughness, feature density, directionality, structural complexity, coarseness, regularity, local orientation, and uniformity), which were rated using a 9-point Likert scale. This bipolar method measures the positive or negative response to a particular quality by labeling the two ends of the scale (e.g. low contrast (1) and

ACCEPTED MANUSCRIPT Mart´ın et al. / Computers & Graphics (2019)

I95-I209

Overall RMSE: 169.31

I95-I207

Perc. distance (dij): 0.26

186.51

165.78

error click

49.21 212.79

231.44 190.55

0.0

x

0.1

0.5 0.4

0.7 0.8

0.9

68.90 44.02

63.55 31.92

35.86

83.40

0

0.3 0.6 0.2

24.04

51.80

41.44

n° clicks

174.78

144.65

Perc. distance (dij): 0.79

73.70

201.48 69.75

Overall RMSE: 56.53 56.25

66.51

1.0

error click

n° clicks

12

0

x 0.0 0.1

0.2

0.3

0.4

0.5 0.6

0.7 0.8

0.9 1.0

50

500

Tij counts

40 30

300

20

200

10 0 0

μ(RMSE)

400

100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 |α| value

1

0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Value of x in g(x)

(a)

(b)

when locating intermediate texture patches, as exemplified for two sample interpolations in Figure 8. Indeed, the variances in the user judgments are lower for visually different textures e.g. (I95 , I207 ), which indicates that the interpolants could be located more accurately. In contrast, the interpolants were less reliably located for similar image pairs e.g. (I95 , I209 ), as the visual differences at neighboring locations along the sequence were hardly noticeable. By relying on perceptually linear interpolation stimuli, the user performance in locating certain patches along the texture strips is expected to follow measurement scale characteristics. Indeed, the level of measurement [50] for a concrete pair is intimately related to the uncertainty with which users perform the task, as it can be observed when inspecting the positions of the centroids in Figure 8. For small overall RMSE values, the texture sequence defines an almost accurate interval-like scale, whereas for large RMSE values this scale becomes less precise. Assuming that the precision of the linearized images in representing an interval scale is associated with the accuracy of approximating the user ratings with a linear function, we fitted g(x) to the data points collected for each T i j ∈ Tlinear . Here, a value of α ≈ 0 would result in an almost-linear function g(x) ≈ x. In general, the obtained results indicate that the user responses are well approximated by functions close to linear, since nearly all g(x) present α-values that approach zero (see Figure 9a), with an average of 0.21 across Tlinear , hence confirming our expectations. Figure 10 shows the user clicks and fitted functions for the two texture sequences shown in Figure 6. As illustrated by these plots, fitting g(x) to the data points produces almost-linear functions. In addition, a high coefficient of determination (R2 ) when fitting f (x) = x to the user ratings would indicate that the variability in the data is appropriately explained by a linear model. In fact, (R2 ) shows acceptable values, i.e. an average of R2 = 0.724 is obtained across all T i j . Both indicators support the approximation of an interval scale behavior for most of the texture combinations. Furthermore, we examined the average RMSE across all pairs and users, separated by the x values of the intermediate patches (see Figure 9b). As shown, the errors are stable across the values of x in the transition, confirming that the subjects’ uncertainty is not affected along the scale. The robustness of the measuring scale was further evaluated by comparing the matrices obtained when taking into account either the user data coming from original stimuli or the data from horizontally-reversed stimuli only. Calculating the RMSE between these two matrices produces a relatively low error value (0.138), implying that the exper-

AN US

Fig. 9: The left figure shows an histogram with the α-values from fitting g(x) to the user clicks. Note how the majority of them have rather low values (below 0.5). The average uncertainty across all workers and Ti j separated by the value of x is displayed on the right. This is measured by the RMSE between the user clicks and the centroids.

CR IP T

Fig. 8: Uncertainty of the users represented by the RMSE across the measurement scale for two example pairs which exhibit low (I95 , I209 ) and large (I95 , I207 ) perceived distance. The error bars correspond to the RMSE for a particular x value, while the vertical lines coincide with the centroids Ci j (x). The respective histograms of the click errors are shown to the right.

PT

ED

M

high contrast (9)). Thus, 24 opposite feature adjectives were given to characterize our dataset of textures. Again, we leaned on AMT to gather 93 workers in order to perform this psychophysical experiment whose corresponding web interface is depicted in Figure 5. At the same time, this study was divided into three separated sub-batches, each of them having a group of 31 workers rating this set of features for 10 particular textures. From the user responses, a feature matrix FI was constructed by averaging these ratings, where fi j represents the j-th perceptual feature for the texture i. 5. Results

AC

CE

In this section, we firstly evaluate our perceptual similarity measures based on the distribution of the user clicks from the localization task, while paying particular attention to the level of measurement or scale of measure [50]. This is followed by the analysis of the PTS derived from the subjective data. Lastly, we examine the performance of our approach in comparison to the distances obtained from a free-grouping experiment. 5.1. Evaluation of perceptual similarity metric To evaluate our metric, we study the distribution of the user clicks obtained from the main experimental study. This includes the validation of the hypothesis that the derived perceptual similarity correlates with larger variances in the user ratings (as described before), the analysis of whether the texture similarity measure shows approximated interval scale behavior as well as the evaluation of the resulting distances themselves. In order to verify the presumed correlation of perceptual similarity with the user variances, we inspected the distributions of the positions in the horizontal axis selected by the participants

ACCEPTED MANUSCRIPT Mart´ın et al. / Computers & Graphics (2019)

0.6

0.6

0.4

0.4

0.2

0.2

0.0

0.0

0.2

0.4

0.6

real value

0.8

1.0

0.0

α = 0.32 R2 = 0.90

0.35 0.3 0.25 0.2 0.15 0.1 0.05 1

0.0

0.2

0.4

0.6

real value

0.8

Fig. 10: Location of user clicks from the main experimental study in the horizontal axis, for two example image pairs. The plots show how fitting g(x) to the points produces almost-linear functions (α ≈ 0). In addition, we obtain adequate R2 values when fitting a linear function. 1

43 0.58 0.00 0.49 0.34 0.38 0.33 0.31 0.43 0.69 0.40 0.63 0.38 0.39 0.47 0.59 0.49 0.54 0.78 0.33 0.42 0.53 0.48 0.40 0.39 0.48 0.44 0.67 0.61 0.63 0.67 48 0.58 0.49 0.00 0.30 0.35 0.45 0.39 0.53 0.67 0.25 0.40 0.51 0.50 0.49 0.44 0.61 0.54 0.52 0.32 0.37 0.55 0.73 0.27 0.45 0.52 0.41 0.57 0.60 0.52 0.52 55 0.62 0.34 0.30 0.00 0.39 0.27 0.27 0.56 0.71 0.24 0.44 0.28 0.26 0.40 0.57 0.56 0.60 0.67 0.36 0.33 0.56 0.72 0.42 0.40 0.52 0.56 0.63 0.57 0.58 0.73

0.9

58 0.57 0.38 0.35 0.39 0.00 0.23 0.23 0.50 0.46 0.48 0.45 0.37 0.22 0.34 1.00 0.35 0.27 0.54 0.28 0.24 0.49 0.70 0.46 0.25 0.46 0.32 0.60 0.46 0.52 0.40 60 0.50 0.33 0.45 0.27 0.23 0.00 0.34 0.41 0.55 0.41 0.53 0.54 0.37 0.67 0.71 0.58 0.50 0.63 0.52 0.41 0.67 0.51 0.47 0.39 0.58 0.51 0.61 0.64 0.62 0.58 61 0.49 0.31 0.39 0.27 0.23 0.34 0.00 0.49 0.92 0.25 0.42 0.32 0.31 0.47 0.61 0.60 0.40 0.56 0.50 0.26 0.65 0.66 0.50 0.28 0.50 0.47 0.65 0.65 0.60 0.66

0.8

67 0.54 0.69 0.67 0.71 0.46 0.55 0.92 0.69 0.00 0.66 0.69 0.66 0.71 0.54 0.45 0.43 0.40 0.58 0.71 0.70 0.52 0.60 0.62 0.75 0.59 0.74 0.62 0.47 0.69 0.43

0.7

79 0.61 0.63 0.40 0.44 0.45 0.53 0.42 0.53 0.69 0.44 0.00 0.57 0.53 0.48 0.65 0.51 0.55 0.45 0.44 0.39 0.51 0.54 0.36 0.49 0.43 0.40 0.50 0.71 0.56 0.45 95 0.54 0.38 0.51 0.28 0.37 0.54 0.32 0.55 0.66 0.45 0.57 0.00 0.27 0.63 0.59 0.61 0.53 0.56 0.34 0.29 0.61 0.79 0.26 0.24 0.48 0.39 0.69 0.67 0.62 0.60 106 0.62 0.39 0.50 0.26 0.22 0.37 0.31 0.54 0.71 0.40 0.53 0.27 0.00 0.47 0.70 0.51 0.55 0.61 0.34 0.26 0.59 0.49 0.38 0.25 0.41 0.61 0.63 0.54 0.77 0.54

0.6

109 0.64 0.47 0.49 0.40 0.34 0.67 0.47 0.48 0.54 0.37 0.48 0.63 0.47 0.00 0.50 0.45 0.50 0.49 0.53 0.47 0.45 0.56 0.38 0.55 0.46 0.39 0.49 0.64 0.50 0.59 177 0.52 0.59 0.44 0.57 1.00 0.71 0.61 0.46 0.45 0.64 0.65 0.59 0.70 0.50 0.00 0.68 0.39 0.51 0.63 0.60 0.53 0.57 0.67 0.54 0.49 0.47 0.51 0.52 0.49 0.49 180 0.74 0.49 0.61 0.56 0.35 0.58 0.60 0.47 0.43 0.55 0.51 0.61 0.51 0.45 0.68 0.00 0.31 0.52 0.61 0.60 0.67 0.46 0.42 0.52 0.51 0.45 0.49 0.76 0.60 0.47

0.5

181 0.69 0.54 0.54 0.60 0.27 0.50 0.40 0.49 0.40 0.60 0.55 0.53 0.55 0.50 0.39 0.31 0.00 0.57 0.47 0.55 0.56 0.82 0.53 0.50 0.50 0.49 0.56 0.59 0.56 0.56 182 0.56 0.78 0.52 0.67 0.54 0.63 0.56 0.65 0.58 0.59 0.45 0.56 0.61 0.49 0.51 0.52 0.57 0.00 0.56 0.65 0.63 0.67 0.52 0.55 0.46 0.39 0.40 0.58 0.50 0.47 183 0.59 0.33 0.32 0.36 0.28 0.52 0.50 0.52 0.71 0.29 0.44 0.34 0.34 0.53 0.63 0.61 0.47 0.56 0.00 0.41 0.59 0.64 0.35 0.34 0.52 0.40 0.54 0.56 0.49 0.49

0.4

184 0.60 0.42 0.37 0.33 0.24 0.41 0.26 0.57 0.70 0.42 0.39 0.29 0.26 0.47 0.60 0.60 0.55 0.65 0.41 0.00 0.55 0.62 0.38 0.28 0.40 0.58 0.67 0.70 0.49 0.66 205 0.49 0.53 0.55 0.56 0.49 0.67 0.65 0.51 0.52 0.49 0.51 0.61 0.59 0.45 0.53 0.67 0.56 0.63 0.59 0.55 0.00 0.65 0.60 0.50 0.49 0.35 0.42 0.39 0.44 0.49

0.3

M

207 0.65 0.48 0.73 0.72 0.70 0.51 0.66 0.55 0.60 0.54 0.54 0.79 0.49 0.56 0.57 0.46 0.82 0.67 0.64 0.62 0.65 0.00 0.49 0.58 0.60 0.60 0.64 0.56 0.67 0.51 209 0.42 0.40 0.27 0.42 0.46 0.47 0.50 0.53 0.62 0.41 0.36 0.26 0.38 0.38 0.67 0.42 0.53 0.52 0.35 0.38 0.60 0.49 0.00 0.38 0.58 0.53 0.63 0.60 0.60 0.45 215 0.61 0.39 0.45 0.40 0.25 0.39 0.28 0.61 0.75 0.34 0.49 0.24 0.25 0.55 0.54 0.52 0.50 0.55 0.34 0.28 0.50 0.58 0.38 0.00 0.52 0.39 0.62 0.62 0.58 0.61 28 0.47 0.48 0.52 0.52 0.46 0.58 0.50 0.46 0.59 0.50 0.43 0.48 0.41 0.46 0.49 0.51 0.50 0.46 0.52 0.40 0.49 0.60 0.58 0.52 0.00 0.41 0.59 0.44 0.49 0.38

0.2

57 0.46 0.44 0.41 0.56 0.32 0.51 0.47 0.44 0.74 0.47 0.40 0.39 0.61 0.39 0.47 0.45 0.49 0.39 0.40 0.58 0.35 0.60 0.53 0.39 0.41 0.00 0.40 0.55 0.47 0.36 62 0.59 0.67 0.57 0.63 0.60 0.61 0.65 0.56 0.62 0.44 0.50 0.69 0.63 0.49 0.51 0.49 0.56 0.40 0.54 0.67 0.42 0.64 0.63 0.62 0.59 0.40 0.00 0.47 0.61 0.52

ED

75 0.48 0.61 0.60 0.57 0.46 0.64 0.65 0.50 0.47 0.63 0.71 0.67 0.54 0.64 0.52 0.76 0.59 0.58 0.56 0.70 0.39 0.56 0.60 0.62 0.44 0.55 0.47 0.00 0.47 0.57

0.1

85 0.58 0.63 0.52 0.58 0.52 0.62 0.60 0.56 0.69 0.39 0.56 0.62 0.77 0.50 0.49 0.60 0.56 0.50 0.49 0.49 0.44 0.67 0.60 0.58 0.49 0.47 0.61 0.47 0.00 0.50 110 0.58 0.67 0.52 0.73 0.40 0.58 0.66 0.70 0.43 0.72 0.45 0.60 0.54 0.59 0.49 0.47 0.56 0.47 0.49 0.66 0.49 0.51 0.45 0.61 0.38 0.36 0.52 0.57 0.50 0.00 34 43 48 55 58 60 61 64 67 74 79 95 106 109 177 180 181 182 183 184 205 207 209 215 28 57 62 75 85 110

3

4

5

6

7

MDS Dimensionality

8

0

PT

Fig. 11: Normalized dissimilarity matrix DI resulting from the user judgments in the localization experiment. The red lines separate the textures obtained from non-fabric samples.

AC

CE

imental results are not significantly affected by reversing the measurement scale. Finally, we also analyzed whether the perceptual matrix derived from our studies satisfies the requirements of a metric. In fact, the obtained distance matrix DI , as shown in Figure 11, follows the conditions of non-negativity (di, j > 0 ∀ i , j) and symmetry (di, j = d j,i ). Besides, the triangle inequality di j + d jk ≥ dik is satisfied for 99.68% of the triplets i, j, k. However, it does not fulfill the condition of identity of indiscernibles (di, j = 0 ∀ i = j), since our method would yield distances di, j  0 for identical texture patches. To solve this limitation, we simply set the elements in the diagonal to zero. 5.2. Analysis of the perceptual texture space based on JND metric Subspace transformation techniques have been often employed to reveal the underlying dimensions of datasets [21, 19, 27, 7, 8, 13]. Although linear approaches like principal component analysis (PCA) and classical multidimensional scaling

9

10

0.3 0.25 0.2 0.15 0.1 0.05 0 1

2

3

4

5

6

7

MDS Dimensionality

8

9

10

(b) Grouping experiment.

Fig. 12: Kruskal’s normalized stress for different numbers of dimensions, resulting from the two experimental procedures. In our approach, there is no obvious choice for the intrinsic dimensionality, while in the free-grouping approach three dimensions seem to be a suitable choice.

(MDS) have been widely used in early texture perception studies, they lack the capability to unveil the non-linear structures that are very likely to be hidden in the data coming from user experiments. In contrast, isometric feature mapping (Isomap) [51] is a non-linear method which computes the distance between points based on a weighed graph, which makes it a powerful technique when the PSM is sparse. Since we chose to collect user ratings for all possible pairs of textures in I, we obtain a fully-populated distance matrix DI by design. Hence, we opted for applying non-classical non-metric MDS to DI , which attempts to estimate a subdimensional embedding for the data while maintaining the ranks of the dissimilarities. Firstly, in order to analyze the intrinsic dimensionality of the underlying PTS, we compute Kruskal’s normalized stress measure [52] for different numbers of dimensions (see Figure 12a). Then, the number of dimensions can be estimated by looking at the elbow where the graph’s strong slope ceases and the remaining stress values even out. In our case, there is no obvious decision as this elbow is not evident in the plot, although three dimensions seem to be a reasonable and easy-to-visualize choice to represent the dimensionality of the data, with a fair Kruskal’s stress value (0.19). Figure 13 presents the output of applying non-classical nonmetric MDS to our data in the XY, XZ and YZ plane, where X, Y and Z represent the three main dimensions of the perceptual space respectively. The distribution of the projected samples along the axes in the PTS indicates the similarity or dissimilarity of the textures according to our metric respectively. As it can be observed, the general distinctness between samples is well-preserved by our model, as textures that have comparable appearances under visual inspection lie nearby in the computed perceptual space, e.g. (I180 , I181 ), (I95 , I183 ), (I61 , I106 ) or (I209 , I215 ). On the other hand, pairs with relatively large visual differences such as (I61 , I67 ), (I106 , I177 ) or (I85 , I207 ) are situated in opposite sides of the axes that conform the main dimensions in the PTS. This observation is in accordance with the initial hypothesis regarding the close relationship between user variance and perceived similarity between textures. When considering the distances between the non-fabric samples and the remaining textures, they tend to exhibit large dissimilarity values in the matrix DI . As a consequence, those whose structure is largely dissimilar to the rest of the samples (i.e. I62 , I75 , I85 , I110 ) are located in outer regions of the sub-dimensional

AN US

64 0.74 0.43 0.53 0.56 0.50 0.41 0.49 0.00 0.69 0.49 0.53 0.55 0.54 0.48 0.46 0.47 0.49 0.65 0.52 0.57 0.51 0.55 0.53 0.61 0.46 0.44 0.56 0.50 0.56 0.70

74 0.56 0.40 0.25 0.24 0.48 0.41 0.25 0.49 0.66 0.00 0.44 0.45 0.40 0.37 0.64 0.55 0.60 0.59 0.29 0.42 0.49 0.54 0.41 0.34 0.50 0.47 0.44 0.63 0.39 0.72

2

(a) Our approach.

1.0

34 0.00 0.58 0.58 0.62 0.57 0.50 0.49 0.74 0.54 0.56 0.61 0.54 0.62 0.64 0.52 0.74 0.69 0.56 0.59 0.60 0.49 0.65 0.42 0.61 0.47 0.46 0.59 0.48 0.58 0.58

Kruskal's normalized stress

0.8

α = 0.19 R2 = 0.68

User data g(x)

0.4

CR IP T

user click

0.8

User data g(x)

0.35

0.45

I61-I67

1.0

Kruskal's normalized stress

I48-I109

1.0

13

ACCEPTED MANUSCRIPT Mart´ın et al. / Computers & Graphics (2019)

a clear elbow shape between dimensions 3 and 4. Therefore, the texture space could be confidently represented by three perceptual dimensions, with an excellent Kruskal’s stress value (0.05). Figure 14 exhibits the output of applying MDS to the distance matrix resulting from the free-grouping experiment. At first glance, we notice a clear circular/spherical shape in the arrangement of the data points when projected into the three first perceptual dimensions. This structure oversimplifies the dissimilarities between certain samples in our dataset, as illustrated by the identifiable cluster formed by rather different textures I34 , I75 , I109 , I177 , I205 . This effect has been also observed in previous similarity studies [8] as the sphere is the optimal geometrical structure that preserves the reduced range of distance values present in GI . Aside from this fact, most of the similarities between the fabric textures of the dataset are adequately modeled in the PTS. Non-fabric samples however, form an identifiable cluster in the first two dimensions, which does not correspond with the actual visual appearance of the textures. Seemingly, the experimental procedure biased participants to group together quite diverse textures that shared an even larger perceived dissimilarity with the rest of the samples.

AN US

space, while the samples I28 and I57 are situated closer to the bulk of the sample points. Even though the distinctness between the fabric and non-fabric materials is preserved in the PTS to a great extent, our metric did not fully capture some obvious texture differences of certain pairs such as (I75 , I177 ), (I28 , I57 ) or (I110 , I67 ). This is in agreement with previous findings that state the difficulty to quantify large texture dissimilarities [10]. Moreover, we investigated a possible interpretation for the perceptual dimensions based on the attributes in FI . To this end, we followed previous investigations and calculated the correlation coefficient between the average feature ratings for each texture sample (the columns in FI ) and their respective coordinates in the three-dimensional space. These coefficients are overlayed on Figure 13 and depicted with greater detail in Figure 15. As it can be observed, each dimension seems to correspond to a combination of features, where Dimension #1 shows a strong correlation with the structural complexity, feature density, contrast and granularity. Dimension #2 corresponds to the qualities describing randomness, roughness and coarseness. Lastly, Dimension #3 is dominated by the randomness, uniformity, repetitiveness, directionality, local orientation and regularity. Interestingly, despite the fact that the stimuli tested in our experiments differ, the arrangement of salient features in each dimension resemble those obtained in similar earlier investigations [19, 7], with variations in their weight and ordering w.r.t. the PTS.

CR IP T

14

M

5.3. Comparison to the perceptual texture space based on freegrouping

AC

CE

PT

ED

Perceptual experiments based on clustering and free-grouping procedures have been demonstrated to yield reliable measures for the perceptual similarity between textures, despite some of their limitations. Thus, we conducted a bottom-up free-grouping experiment in a similar fashion to Rao and Lohse [7], which was employed as a yardstick to evaluate the performance of the proposed metric. Each texture in our database I was printed on a 2 × 2 inches matte photographic paper at a resolution of 400 points per inch (ppi). We presented these printings to the participants, which were instructed to “group the textures according to their perceived visual similarity” and to avoid as much as possible clustering based on the image semantics. In total, 14 subjects participated in this process. All of them provided informed consent and were compensated economically for their participation. In this stage, the similarity scores ti j ∈ RI were defined as the ratio between the number of times that images i and j were grouped together and the total number of grouping steps, across all participants. As before, we calculated the final distance matrix as GI = 1/RI . Given the reduced size of the evaluated dataset, the freegrouping task yields a densely populated matrix GI which fulfills the conditions of non-negativity, symmetry and identity of indiscernibles. This matrix satisfies the triangle inequality for 99.87% of the triplets. In addition, the Pearson correlation coefficient between DI and GI exhibits a relatively high value 0.632. In the same way as in the previous section, we apply non-classic non-metric MDS to GI and compute Kruskal’s normalized stress measure in order to explore the intrinsic dimensionality of the PTS (see Figure 12b). According to this, the stress values present

6. Conclusions and future work

In this paper, we present a novel formulation for measuring the mutual perceptual similarity or distance between material textures. This approach relies on patch-based statistical texture synthesis to generate the experimental stimuli and is inspired by the notion of just-noticeable differences. Thereby, we exploit the hypothesis that, given a synthesized interpolated sequence between two images, the confidence with which subjects are able to locate intermediate specimens in the continuum will correlate with the perceived similarity of the corresponding textures. To successfully apply this insight, we proposed a perceptual linearization of the texture interpolation results that otherwise only represent a linear transition in the space of the used computational features. This perceptual linearization produces a consistent quantitative scale. To validate our approach, we analyze a plausible interpretation of the underlying dimensions and compare our experimental results with those from a free-sorting experiment. Indeed, our technique produces a more descriptive space of textures that strongly correlates with earlier studies. In order to generate the interpolated image strips, we made use of a fully automatic, state-of-the-art texture synthesis algorithm (i.e. Image Melding [34]) which represented a great compromise between time performance and visual quality. However, this method is less reliable when the structural content of the textures is quite diverse, and consequently, the outcome cannot be used as a scale to measure texture similarity. The perceptual linearization step resolves the problem to a large extent. The soundness of the scale is further confirmed by fitting an almostlinear function to the user ratings and additionally by observing the average RMSE for all pairwise combinations. However, even after the linearization process, some texture pairs still present relatively abrupt transitions, suggesting that the interpolation between rather dissimilar samples may not always be meaningful. Indeed, our pipeline focuses on fine-grained similarities

ACCEPTED MANUSCRIPT Mart´ın et al. / Computers & Graphics (2019)

0.6 180

0.4

High contrast 110 106 43 209 64 60 58 109 177 215 Non repetitive 182Repetitive Non directional - Low complex. 95 61 183 79 High complex. 57 - Uniform62 Directional Non granular Non-uniform Non oriented Irregular - Regular 55 28 75 Locally oriented Granular 184

0

Rough - Smooth

-0.2

74

Low feat. density High feat. density

Coarse - Fine

48

Non-random - Random

-0.6

Texture Feature

-0.6

-0.4

-0.2

0

43 61 60

0.2

0.4

0.6

Non-random - Random

62

Non-granular - Granular

Low complex. High complex. Low contrast High contrast

205 177 67

Low feat. density High feat. density

Non oriented Locally oriented 34 Non directional Directional Non-uniform - Uniform Non repetitive 64 Repetitive

0

-0.4

-0.2

0

0.2

0.4

0.6

Dimension #1

109 209

Low complex. - High complex

55

205

58

Low feat. density - High feat. density

28 61

-0.2

177

Irregular - Regular

-0.4

207

Low contrast High contrast

43

67 181

60

75

75

Irregular - Regular

182

110 183 85 Rough - Smooth57 215 180 Coarse - Fine 74 Non-granular 184 95 106 - Granular

Non-oriented - Locally oriented Non-uniform - Uniform

Non directional Directional Non repetitive 64 Repetitive

34

-0.6

Dimension #1

79

48

0.2

Texture Feature

-0.6

0.4

110 57 85

28 181

-0.4

85

-0.6

55

-0.2

205

34

-0.4

109 48 183 Rough - Smooth 207 215 Coarse - Fine180 184 74 58 95 106

0

Non-random - Random

62

209

0.2

Dimension #3

Dimension #2

Low contrast -

0.2

182

79

0.4

67

181

0.6

Dimension #3

207

Texture Feature

CR IP T

0.6

15

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Dimension #2

Fig. 13: The three first dimensions resulting from applying non-classical, non-metric MDS to the dissimilarity matrix DI resulting from our method. Blue points represent the projected positions of the fabric material samples, while red points illustrate the non-fabric subset. Correlation coefficients between (FI ) and the main perceptual dimensions are represented with green arrows.

181 180

0.2

43 60 209 55 61

0

215

-0.2

75

64

85

Non-random - Random Coarse - Fine

18495

106 58

-0.6

57 62 110 28

182

-0.4

0

0.2

0.4

Low feat. density -

Irregular - Regular

-0.4

Rough - Smooth

183

0

Low complex. High complex.

0.2

0

215

Coarse - Fine

43 61 85

58 184 183 106 74 95 48

-0.2

0.6

-0.4

Rough - Smooth

-0.6

-0.4

-0.2

0

0.2

Dimension #1

0.4

0.6

207 60

177Non repetitive Repetitive Non direct Directiona

64

55 62

Non oriented 109 34 Locally oriented 75 205

57

Non-uniform - Uniform Irregular - Regular

Texture Feature

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Dimension #2

ED

Dimension #1

Low feat. density - High feat. density

Rough - Smooth

-0.6

Texture Feature

180Low complex. - High complex

209

182 Non-random - Random

79

Low contrast Non granular - High contrast Granular

181

110

Non-uniform - Uniform

79

-0.6

-0.2

182

28

110

207 Non-random - Random High feat. density 215 60 85 61 184 58 177 79 183 Non-repetitive - Repetitive 74106 Non-oriented - Locally oriented 34 55 48 64 95 Non directional - 62 Directional 109 75 205 Coarse - Fine57

-0.2

Texture Feature

-0.6

209 43

67

0.4

28

Non-granular - Granular

180

0.2

67

48

Low contrast High contrast

181

Irregular - Regular

74

-0.4

0.4

205

M

207

0.6

67

Non granular Non-uniform - Uniform Granular

Dimension #3

Dimension #2

0.4

0.6

177

34

AN US

109

Dimension #3

Low feat. density High feat. density Low contrast Non oriented - High contrast Locally oriented Low complex. High complex.

Non-repetitive - Repetitive

0.6

0.68

0.32

0.59

0.28

-0.19

0.69

0.41

0.72

-0.00

0.13

0.43

0.21

0.15

-0.02

-0.08

-0.24

-0.17

-0.09

-0.03

-0.07

-0.17

-0.08

-0.11

-0.06

0.05

-0.36

0.03

0.41

0.10

-0.13

-0.33

0.00

0.06

-0.27

-0.31

-0.36

CE

AC

N

on

-r ep

H

Lo w ig c h on co tr nt as et ra t iti ve st -R N on e pe -g ra tit nu iv e la r N -G on ra -r nu an la do r m -R an do Ro m ug h -S Lo m H w ig fe oo N h th on fe at. -d at de ire .d n ct en sit io Lo si y na w ty lco di m re pl c ex tio .na H l ig h co m pl ex Co ar se Irr -F eg in ul e ar -R eg N ul lo on ar ca o lly rie N on or nte -u ie d nt ni fo ed rm -U ni fo rm

X Y Z

PT

Fig. 14: The three first dimensions resulting from applying non-classical, non-metric MDS to the dissimilarity matrix GI resulting from the grouping procedure. Blue points represent the projected positions of the fabric material samples, while red points illustrate the non-fabric subset. Correlation coefficients between (FI ) and the main perceptual dimensions are represented with green arrows.

fer styles between natural or artistic images [36], as well as to synthesize interpolated images by means of a simple linear interpolation of deep convolutional features [37]. These methods however, are not optimized for preserving the details in the appearance of material textures and the derived images may be still noisy, containing artifacts produced by the neural networks. Still, our pipeline is not restricted to a concrete synthesis algorithm.

Fig. 15: Correlation coefficients between the feature ratings (FI ) and the three main dimensions of the PTS. The larger values for each dimension are highlighted in red.

Furthermore, the computed metric does not provide identity of indiscernibles by definition, as the users’ uncertainty from which we derive the similarities will remain within a range given by the size of the interpolated strip. Hence, the resulting distances will never become exactly zero.

between textures from materials that share a certain degree of characteristics and which are situated in the medium to high end of the similarity scale, being unsuitable for non-stationary images or nearly identical textures. The latter remains one of the limitations of applying texture synthesis algorithms in general, as concluded by Kol`aˇr et al. [38], and therefore also to the problem of measuring perceived similarity. More recent developments in texture synthesis have employed neural networks trained for object recognition in order to trans-

By and large, our methodology intends to alleviate some of the obstacles posed by earlier approaches when gathering image/texture similarity data, namely the parallelizability and duration of the perceptual task itself, the extensibility to new sample points and the sparsity of the resulting matrix. In order to avoid the bias inherent to subjective similarity estimations, we exploited the noise intrinsic to a localization exercise into our benefit. As one of our goals is to study fine-grained similarities between fabric textures, we opted for collecting all the

ACCEPTED MANUSCRIPT Mart´ın et al. / Computers & Graphics (2019)

References [1] Fleming, RW, Wiebel, C, Gegenfurtner, K. Perceptual qualities and material classes. Journal of Vision 2013;13(8):9. [2] Schwartz, G, Nishino, K. Automatically discovering local visual material attributes. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2015, p. 3565–3573. [3] Sharan, L, Liu, C, Rosenholtz, R, Adelson, EH. Recognizing materials using perceptually inspired features. International Journal of Computer Vision 2013;103(3):348–371. [4] Weinmann, M, Gall, J, Klein, R. Material classification based on training data synthesized using a btf database. In: European Conf. on Computer Vision. Springer; 2014, p. 156–171. [5] Shotton, J, Winn, J, Rother, C, Criminisi, A. Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Computer Vision – ECCV 2006. Berlin, Heidelberg: Springer; 2006, p. 1–15. [6] Kasson, JM, Plouffe, W. An analysis of selected computer interchange color spaces. ACM Trans on Graphics 1992;11(4):373–405. [7] Rao, AR, Lohse, GL. Towards a texture naming system: Identifying relevant dimensions of texture. Vision Research 1996;36(11):1649 – 1669. [8] Rogowitz, BE, Frese, T, Smith, JR, Bouman, CA, Kalin, EB. Perceptual image similarity experiments. In: Human Vision and Electronic Imaging III; vol. 3299. Int. Society for Optics and Photonics; 1998, p. 576–591. [9] Clarke, A, Halley, F, Newell, A, Griffin, L, Chantler, M. Perceptual similarity: A texture challenge. In: Proc. of the British Machine Vision Conf. BMVA Press; 2011, p. 120.1–120.0. [10] Zujovic, J, Pappas, TN, Neuhoff, DL, van Egmond, R, de Ridder, H. Effective and efficient subjective testing of texture similarity metrics. Journal Optical Soc of America 2015;32(2):329–342. [11] Adelson, E. On seeing stuff: the perception of materials by humans and machines. Proc of SPIE 2001;4299:1–12. [12] Mart´ın, R, Iseringhausen, J, Weinmann, M, Hullin, MB. Multimodal perception of material properties. In: ACM SIGGRAPH Symposium on Applied Perception. SAP ’15; New York, NY, USA: ACM; 2015, p. 33–40. [13] Wills, J, Agarwal, S, Kriegman, D, Belongie, S. Toward a perceptual space for gloss. ACM Trans on Graphics 2009;28(4):103:1–103:15. [14] Serrano, A, Gutierrez, D, Myszkowski, K, Seidel, HP, Masia, B. An intuitive control space for material appearance. ACM Trans on Graphics 2016;35(6):186:1–186:12. [15] Mart´ın, R, Weinmann, M, Hullin, MB. Digital transmission of subjective material appearance. Journal of WSCG 2017;25(2):57–66. [16] Hallett, C, Johnston, A. Fabric for Fashion: The Swatch Book. Laurence King Publishing; 2010. [17] Landy, MS, Graham, N. Visual perception of texture. The Visual Neurosciences 2004;1:1106. [18] Halley, F. Perceptually relevant browsing environments for large texture databases. Ph.D. thesis; Heriot-Watt University; 2012. [19] Liu, J, Dong, J, Cai, X, Qi, L, Chantler, M. Visual perception of procedural textures: Identifying perceptual dimensions and predicting generation models. PLOS ONE 2015;10(6):1–22. [20] Balas, B. Attentive texture similarity as a categorization task: Comparing texture synthesis models. Pattern Recognition 2008;41(3):972–982. [21] Gurnsey, R, Fleet, DJ. Texture space. Vision Research 2001;41(6):745 – 757. [22] Pappas, TN, Neuhoff, DL, de Ridder, H, Zujovic, J. Image analysis: Focus on texture similarity. Proc of the IEEE 2013;101(9):2044–2057. [23] Manjunath, BS, Ma, WY. Texture features for browsing and retrieval of image data. IEEE Trans on Pattern Analysis and Machine Intelligence 1996;18(8):837–842. [24] Ojala, T, Pietikainen, M, Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans on Pattern Analysis and Machine Intelligence 2002;24(7):971–987. [25] Varma, M, Zisserman, A. A statistical approach to texture classification from single images. International Journal of Computer Vision 2005;62(1):61–81. [26] Dong, X, Methven, TS, Chantler, MJ. How well do computational features perceptually rank textures? a comparative evaluation. In: Proc. of International Conf. on Multimedia Retrieval. ICMR ’14; New York, NY, USA: ACM; 2014, p. 281–288. [27] Lou, J, Qi, L, Dong, J, Yu, H, Zhong, G. Learning perceptual texture similarity and relative attributes from computational features. In: 2016

CE

PT

ED

M

AN US

possible pairwise similarities in our database. With this design, the complexity of our solution quickly increases with the size of the database (O(n2 )), as adding a new material k involves conducting an initial linearization plus additional experiments for all possible combinations between material k and the existing dataset. Still, our proposed metric can be equally utilized to collect similarities for an incomplete set of pairs, hence resulting in a sparse distance matrix. In this case, alternative dimensionality reduction techniques which can deal with sparsity, like Isomap [51] or the MDS methodology proposed by Wills et al. [13], would have to be applied. For the purpose of performance evaluation, we compared the proposed metric to the similarity derived from a free-grouping experiment. Although both distance metrics present an interesting degree of correlation, the examination of the derived low-dimensional texture spaces reveals that the data obtained from grouping is less-precise and exhibits a less-informative circular structure, agreeing with previous studies comparing data collection paradigms [28, 49]. However, this technique permits a smoother clustering of dissimilar textures and offers an interesting alternative, especially for the development of qualitative similarity metrics and classification algorithms. The popularity of deep learning techniques entails an increasing demand for large annotated datasets. Future investigations in computer graphics may employ our perceptually-inspired similarity metric to annotate texture distances for later train fully-computational similarity algorithms. Still, for the purpose of making accurate predictions, a far larger database of material images from retailing and fashion stores would need to be collected. Admittedly, our experimental pipeline could also benefit from the utilization of regression methods combined with deep image features, in order to automatically compute the linearization parameter (α) which, however, also relies on the availability of a large texture database. Finally, the arrangement of the data points in Figure 10 (left) has a noticeable sigmoidal (s-shaped) residual. Higher-order linearization could be incorporated in following iterations of our methodology. This would probably produce more uniform transitions between the original exemplars and hence more reliable measurements.

CR IP T

16

AC

Acknowledgments

This work was partially funded by the X-Rite Chair and Graduate School for Digital Material Appearance. We further thank Julian Iseringhausen for the valuable discussion. Declaration of Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

ACCEPTED MANUSCRIPT

[34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52]

AN US

[33]

M

[32]

ED

[31]

PT

[30]

CE

[29]

AC

[28]

International Joint Conf. on Neural Networks (IJCNN). 2016, p. 2540– 2546. Bijmolt, THA, Wedel, M. The effects of alternative methods of collecting similarity data for multidimensional scaling. International Journal of Research in Marketing 1995;12(4):363–371. Wei, LY, Lefebvre, S, Kwatra, V, Turk, G. State of the Art in Examplebased Texture Synthesis. In: Eurographics 2009, State of the Art Report. Munich, Germany: Eurographics Association; 2009, p. 93–117. Efros, AA, Leung, TK. Texture synthesis by non-parametric sampling. In: Proc. of the International Conf. on Computer Vision; vol. 2 of ICCV ’99. Washington, DC, USA: IEEE Computer Society; 1999, p. 1033–. Portilla, J, Simoncelli, EP. A parametric texture model based on joint statistics of complex wavelet coefficients. International Journal of Computer Vision 2000;40(1):49–70. Gatys, LA, Ecker, AS, Bethge, M. Texture synthesis using convolutional neural networks. In: Proc. of the International Conf. on Neural Information Processing Systems; vol. 1 of NIPS’15. Cambridge, MA, USA: MIT Press; 2015, p. 262–270. Ruiters, R, Schnabel, R, Klein, R. Patch-based texture interpolation. Computer Graphics Forum 2010;29(4):1421–1429. Darabi, S, Shechtman, E, Barnes, C, Goldman, DB, Sen, P. Image Melding: Combining inconsistent images using patch-based synthesis. ACM Trans Graph 2012;31(4):82:1–82:10. Barnes, C, Shechtman, E, Goldman, DB, Finkelstein, A. The generalized patchmatch correspondence algorithm. In: European Conf. on Computer Vision. Springer; 2010, p. 29–43. Gatys, LA, Ecker, AS, Bethge, M. Image style transfer using convolutional neural networks. In: IEEE Conf. on Computer Vision and Pattern Recognition. CVPR ’16; 2016, p. 2414–2423. Upchurch, P, Gardner, JR, Pleiss, G, Pless, R, Snavely, N, Bala, K, et al. Deep feature interpolation for image content changes. In: Conf. on Computer Vision and Pattern Recognition (CVPR). 2017, p. 6090–6099. Kol´aˇr, M, Debattista, K, Chalmers, A. A subjective evaluation of texture synthesis methods. Computer Graphics Forum 2017;36(2):189–198. Hossain, S, Serikawa, S. Texture databases - a comprehensive survey. Pattern Recognition Letters 2013;34(15):2007–2022. Brodatz, P. Textures: a photographic album for artists and designers. Dover Pubns; 1966. Meastex image texture database. http://www.texturesynthesis. com/meastex/meastex.html; 1997. Last accessed October 2018. Xu, Y, Ji, H, Ferm¨uller, C. Viewpoint invariant texture description using fractal analysis. International Journal on Computer Vision 2009;83(1):85– 100. Chantler, MJ. The effect of variation in illuminant direction on texture classification. Ph.D. thesis; Heriot-Watt University; 1994. Dana, KJ, van Ginneken, B, Nayar, SK, Koenderink, JJ. Reflectance and texture of real-world surfaces. ACM Trans on Graphics 1999;18(1):1–34. Photex database. http://www.macs.hw.ac.uk/texturelab/ resources/-databases/photex/; 2018. Last accessed October 2018. of Amsterdam, U. Amsterdam library of textures (ALOT). http:// aloi.science.uva.nl/public_alot/; 2009. Last accessed October 2018. Bell, S, Bala, K, Snavely, N. Intrinsic images in the wild. ACM Trans on Graphics 2014;33(4). Cunningham, D, Wallraven, C. Experimental Design: From User Studies to Psychophysics. 1st ed.; Natick, MA, USA: A. K. Peters, Ltd.; 2011. Clarke, AD, Dong, X, Chantler, MJ. Does free-sorting provide a good estimate of visual similarity. Predicting Perceptions 2012;:17–20. Stevens, SS. On the theory of scales of measurement. Science 1946;103(2684):677–680. Tenenbaum, JB, de Silva, V, Langford, JC. A global geometric framework for nonlinear dimensionality reduction. Ann NY Acad Sci 2000;911:418. Kruskal, JB. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 1964;29(1):1–27.

17

CR IP T

Mart´ın et al. / Computers & Graphics (2019)