Evaluation and assessment of homogeneity in images. Part 1: Unique homogeneity percentage for binary images

Evaluation and assessment of homogeneity in images. Part 1: Unique homogeneity percentage for binary images

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39 Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory S...

5MB Sizes 5 Downloads 123 Views

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemometrics

Evaluation and assessment of homogeneity in images. Part 1: Unique homogeneity percentage for binary images s c, Manel Bautista d, Leandro de Moura França a, Jose Manuel Amigo a, b, *, Carlos Cairo e Maria Fernanda Pimentel a

Department of Fundamental Chemistry, Federal University of Pernambuco, Av. Prof. Moraes Rego, 1235, Cidade Universit aria, Recife, Brazil Department of Food Sciences, University of Copenhagen, Rolighedsvej 30, DK-1958 Frederiksberg-C, Denmark Christian Doppler Laboratory for Cavitation and Micro-Erosion, Drittes Physikalisches Institut, Germany d Novartis Pharma AG, CH-4056 Basel, Switzerland e Department of Chemical Engineering, Federal University of Pernambuco, Av. Prof. Moraes Rego, 1235, Cidade Universit aria, Recife, Brazil b c

A R T I C L E I N F O

A B S T R A C T

Keywords: Image analysis Homogeneity distribution Macropixel analysis Continuous-Level Moving Block Binary images

Texture features analysis is one of the most important approaches for the assessment of homogeneity on images. However, all of them are either relative to the comparison with a standardized set of images, or further multivariate models are strongly required to predict or classify the images according to their features. In this first work, we propose an alternative and novel methodology to calculate a percentage of homogeneity by only using the selfinformation contained on the image. This methodology is based on the macropixel analysis theory and the generation of what is called the “homogeneity curve”. The homogeneity curve is deeply explored and the knowledge to what it could be considered the most homogeneous and inhomogeneous distribution for every case is spanned. This first work postulates the theory and demonstrates its usefulness with several examples applied to binary images. This will provide a theoretical framework to fully understand the homogeneity curve, postulating a mathematical model to parametrize homogeneity and its plausible deviations.

1. Some considerations about texture analysis and homogeneity One of the main keystones on image analysis is the characterization of the different spatial positions and colors of the elements in an image. This is known as image features extraction. Acknowledging that a feature is a distinguishing characteristic of a primitive image or an image attribute, features can be natural (defined by visual appearance of the image, eg., luminosity) or artificial (resulting from the image manipulation, eg., amplitude of image histogram) [1]. Let us contextualize this concept by using the simplest image type, a gray scale image in which each pixel contains an intensity value. In a gray scale image, two main contextual features define the information about the objects in the image: 1) The tone, that is based on the variation of levels of gray of a resolution cell; and 2) the texture or conformational disposition, that is related to the spatial distribution of the gray tones [2]. As example, Fig. 1a and c depict one gray scale image showing the same elements but with two different spatial dispositions. The image of configuration 1 (Fig. 1a) contains different tones of gray, well represented by the corresponding grayscale tones histogram

(Fig. 1b). It can be distinguished up to 5 different main tones: at zero level (black) and different gray tones (centered at 75, 150, 220 and 240). Another configuration of the same elements is depicted in Fig. 1c. This image contains the same elements as the one in Fig. 1a but in another spatial disposition. Therefore, assuming that both images contain the same number of pixels, the same elements and they have the same pixel resolution, the histograms resulting from both images will contain the exact same information. Therefore, their histograms (Fig. 1b and d) will be the same. The tone levels characterize the complexity in color distribution. Nevertheless, tone is most of the times insufficient to characterize an image itself as no differences can be assessed from different images. This difference is what constitutes the texture of the images. The texture of an image can be defined as the spatial or conformational disposition of different gray levels in a specific neighborhood of the image and is characterized as a function of the spatial variation of those groups of pixel intensities in the image [3]. Thus, texture can be categorized depending on the spatial distribution of the different elements as: strongly ordered, weakly ordered, disordered and compositional 1. It is well-known that an image can contain different textures.

* Corresponding author. Department of Food Sciences, University of Copenhagen, Rolighedsvej 30, DK-1958 Frederiksberg-C, Denmark. E-mail address: [email protected] (J.M. Amigo). https://doi.org/10.1016/j.chemolab.2017.10.002 Received 25 May 2017; Received in revised form 25 August 2017; Accepted 6 October 2017 Available online 8 October 2017 0169-7439/© 2017 Elsevier B.V. All rights reserved.

L. de Moura França et al.

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39

Fig. 1. a) Configuration 1 of the four elements and b) its corresponding histogram of the pixel values. C) Configuration 2 of the four elements and d) its corresponding histogram of the pixel values.

information is omitted [6]. Lacey (1954) discussed the concepts of complete and partial mixtures using binary samples and statistical expressions to define the state of mixture, with a theoretical definition for intensity of segregation, using variance analysis of the numerical fraction in a binary mixture, with mean numerical fraction of one particle [6,7]. This hypothesis was revisited by the Macropixel analysis (MA) methodology introduced by Hamad et al. [8]. MA is a method based on scrutinizing an image at different sub-levels (macro-pixels) and then generating what is known as the homogeneity curve. The working procedure is a straightforward application of the box-counting methodology for fractal calculations [9]. Nevertheless, the big difference between both methods is that MA calculates the pooled standard deviation of the sub-windows generated. Two ways of scanning methods were developed for MA: The “Discrete Level tiling” (DLT), which uses non-overlapping tiles; and the “Continuous-Level Moving Block” (CLMB) which uses all possible macropixels in the image. At the end of the analysis a homogeneity curve is generated as representation of the texture of the image. Despite the fact its great potential, few research has been conducted to fully understand the behavior and the main characteristics of this curve. This first paper deeply studies the behavior of the homogeneity curve in order to be able to assess the properties of the “perfectly homogeneous sample”. In addition, the evaluated examples will demonstrate that the homogeneity curve meets the model and formulation stated by Lacey et al. [7] when random distributions are conforming an image, and will show the behavior of the homogeneity curve for different cases. With the aim of fulfilling with these objectives, some premises must be taken in consideration:

Therefore, one texture in an image is a region that repeats itself in a spatial distribution, instead of considering global texture. Consequently, some questions must be postulated: - How to detect those regions that, within an image, repeat themselves. - How to “quantify” and differentiate the spatial disposition of those regions. Different methods have been developed to characterize the spatial distribution of the pixel intensities in a pixel-wise direction, aiming at obtaining statistical parameters that are different between images containing the same elements but in different spatial disposition (texture feature) [4]. Texture features extraction methods can be classified in four categories: (i) statistical, describing the texture of image regions, by means of high order moments on the histograms; (ii) structural, defining texture as well-defined compositional elements (spatial regularity of parallel lines); (iii) model based, which creates an empiric model of each pixel image; and (iv) transform-based, that converts an image in other form, using properties of spatial frequencies (wavelet transform) [5]. One of the most accepted approaches is the one postulated by Haralick et al. (1973) known as Gray-Level Co-ocurrence-Matrix (GLCM). It is based on the calculation of a set of statistical features (angular second moment, contrast, correlation, variance, inverse difference moment, sum average, entropy, energy, etc.) not in the original grayscale image, but in the matrices generated by considering: one pixel, its gray level, and the level of the surrounding pixels [2]. Like GLCM, texture feature extraction methods are mostly based on sub-dividing the image in smaller images and study different properties of those sub-sets and their correlation. This sampling approach, known as scale of scrutiny, becomes a critical step to ensure that no relevant

1) The elements conforming one image are indivisible. 27

L. de Moura França et al.

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39

Once the homogeneity curve has been parametrized, a unique homogeneity percentage is postulated, using the self-information contained in the image. Different simulations have been developed accounting for the diverse cases that may represent homogeneity in an image. Moreover, the proposed methodology is validated by evaluating real cases as well, the first one with simulated analysis using images with round objects in different concentrations; and the second one with real images. 2. Macropixel analysis The Continuous-Level Moving Block Method (CLMB) is based on the consecutive sub-sampling of one image in sub-sample window (also called macro-pixels) of different sizes, considering that there are as many sub-sample windows of one determined sub-size as possible [8]. Once the picture has been sectioned, the pooled standard deviation is calculated depending on the size of the macro-pixels. Having an image of dimensions (I x J), In general for a sub-window size (I0  J0 ) each sub-sample can be denoted as SI0 J0 of dimensions (I0  J0 ) where I0 ¼ J0 each time (for simplification, only square subsamples are considered). Consequently, the sub-window size of each sub-sample SI0 J0 is defined in Equation (1)

PIXSI0 J0 ¼ I 0  J 0

(1)

The total number of sub-samples for each determined window size, TOTALSI0 J0 is calculated as follows (2).

TOTALSI 0 J0 ¼ ½I  ðI 0  1Þ*½J  ðJ 0  1Þ

Fig. 2. Top of the figure, two examples of sub-windows (2 and 3 squared pixels) and bottom, the corresponding value of the mean standard deviation of all the sub-windows.

(2)

Now, the standard deviation for each sub-sample SI0 J0 can be easily calculated with Equation (3). 2) The image only contains 2 values (binary images having 0 or 1 as object values).

Fig. 3. Simulation of a case in which the concentration of white pixels is the same in the three pictures, but their distribution is different. 28

L. de Moura França et al.

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39

and C) it can be observed that the curve dramatically changes. Logically, the value of SD when r has its minimum value (0.01) is the same for the three images, since the concentration of the element remains constant. Nevertheless, the shape of the curve changes with the increasing value of r towards 1. The change in the homogeneity curve is usually ascribed to a change in the homogeneity. Nevertheless, the change in the curve can also be associated to the fact that the elements of the three images change their size while keeping the concentration (case 2). Therefore, three important features have to be evaluated to fully understand of the homogeneity curve: i. The final distribution of the object in the image of the surface; ii. The object size/image size ratio; iii. The global concentration of the objects. 3. Material and methods 3.1. Simulations and CLMB algorithm All simulation of the images and CLMB algorithm were developed in MATLAB v2015b (The Mathworks, MA, USA) using in-house routines and available upon request ([email protected]). A total of four different cases have been studied accounting for the diverse cases that may represent homogeneity in an image: Case 1: This study evaluates the homogeneity curve when deterministic structures are found in an image. In this case, the homogeneity curve will give an account of the different structured dispositions of an image. Case 2: The second example investigates the homogeneity curve when the elements in the image are randomly distributed. Moreover, the elements are indivisible, and therefore the pixel values do not change within the images. Different simulations have been performed by varying the concentration of elements from 1 to 95%, expressed in percentage of pixels of the image containing one element. Since the number of random images that a single concentration and element size can generate is very high, a total of 1000 simulations have been developed for each concentration. Case 3: The third study evaluates the behavior of the curve when the concentration and their ratio with the image of indivisible elements vary in an image. For this purpose, 1000 binary images with elements ranging in concentration from 1 to 50% and with pixel size/image size ratio of 0.02, 0.03, 0.04, 0.05, 0.07 and 0.10 related to image size, have been simulated. Case 4: This example evaluates the homogeneity curve for extremely heterogeneous samples, without variation in pixel values and only having different concentrations.

Fig. 4. Chess table distribution and homogeneity curve.

STDSI0 J0

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u I0 J0 2 uP P  u ui0 ¼1 j0 ¼1 TOTALSI0 J0  s t ¼ PIXSI 0 J0  1

(3)

Therefore, the global standard deviation for each sub-sample window size is:

P Sw

I0 J0

¼

STDSI0 J0 TOTALSI0 J 0

(4)

For better understanding of the methodology, Fig. 2 shows two examples. Having an image of 10  10 pixels (I  J), the methodology calculates the standard deviation of each sub-sample that can be formed for each sub-window. In the red example, the selected sub-window is 2  2 (I0  J0 ), accounting for a total size of 4 pixels per sub-sample (PIXS22 ¼ 4). Consequently, the total number of sub-samples able to be formed, TOTALS22 , is 81. The same numbers can be calculated for a subwindow of 3  3 pixels (green example), accounting for 64 subsamples (Fig. 2b). This SwI0 J’ values are calculated for any possible macro-pixel size and then projected against r, defined as:

  r ¼ PIXSij ðI  J

(5) 3.2. Real examples

As we will see further, the image size plays an important role. Therefore, it is important that the standard values calculated are equally comparable between images. The r parameter makes possible this comparison, since the SDI0 J0 are plotted against the relative size of the macropixel. In such a way, in the example of Fig. 2, when the macro-pixel size is 2, the r value will be 0.04; while when the macro-pixel size is 10, then the r value is 1. In this way, images with different size can be compared. As mentioned above, a homogeneity curve is the results of the application CLMB analysis to evaluate the mean standard deviation of the macro-pixels in an image. To understand the complexity of the homogeneity curve, a simple example is depicted in Fig. 3, where three images are shown having the same concentration of white color (3%), but elements of different sizes distributed in a random manner. This could represent a case in which 3% of a powder is being blended (case 1). This element, assuming that its minimum particle size is 1 pixel2, will be more distributed while the blending process is running. Calculating the homogeneity curve at three different stages of the blending process (A, B

The real cases are: Case 5: Round objects of the same size have been mixed in a squared box. Different concentrations (10, 20, 30 and 40%) have been used to evaluate the mixing process. This example uses RGB images obtained with a digital camera at different blending times. Case 6: 244 images of different (full and low fat yoghurts without fat replacer) yoghurt samples were obtained using Confocal Laser Scanning Microscopy (CLSM). The object information on the image are related to protein content of the yoghurt [10]. 4. Results and discussion 4.1. Case 1: homogeneous but not random distribution The most straightforward way to start is the study of the behavior of the homogeneity curve in a “perfect” non-stochastic distribution of 29

L. de Moura França et al.

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39

Fig. 5. Examples of random images for different concentrations (a) 1%, (b) 25%, (c) 50% e (d) 90%.

Fig. 6. Homogeneity curves for 20 concentrations of homogeneous samples simulated.

30

L. de Moura França et al.

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39

Fig. 7. a) Image with 10% of singular elements and b) the corresponding homogeneity curve (—and respective estimated model (o).

4.2. Case 2: stochastic distribution of the elements at pixel level varying the concentration of elements

elements, e.g. a chess distribution, were the concentration of elements is 50% (Fig. 4). The homogeneity curve depicted in Fig. 4 has different characteristic points, which offer valuable information. The curve is decreasing in value until reach to the first minimum. This point belongs to the window size with ratio of 0.02. This window size covers 2 black and 2 white squares, being the standard deviation equal to zero. Therefore, this point indicates the minimum squared sub-window in what the image can be divided, ensuring the perfect ratio between black and white elements in all the sub-windows. However, there is another point encountered in sub-windows with ratio of 0.03, in which the curve reaches a local maximum. That point indicates that there exists an unbalance in the standard deviation. This pattern of local zeros and maxima is repeated in the different scales of scrutiny, showing that it is not strictly necessary that the homogeneity curve decays monotonously when the size of the sub-window is increased. On the other hand, the curve also depends of the geometry of the distribution.

Considering element size ratio of 0.01 and invariant during simulations, 1000 random images were simulated for each concentration of elements (from 1 to 95%, in steps of 5%). Fig. 5 presents the mean homogeneity curve (solid) and the standard deviation (dotted) of four concentrations. The mean curves for the 1000 simulation from the 20 concentrations are shown in Fig. 6. All curves present similar profiles, considering the values of sw for all the window sizes. This similarity in the pattern evidences that the random distribution is needed to achieve the homogeneous surface, because the high values of sw achieved in the small windows size are reduced near to zero, concerning the increment of the window size and, on the contrary of the chess board picture in section 4.1, all homogeneity curves are monotonically descending towards zero. Lacey evaluated theories of particle mixing verified in Brothman theory that is an interface area of two materials increasing with time

Fig. 8. Adjusted parameters for the developed model with Equation (5), for homogeneous samples.

31

L. de Moura França et al.

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39

Fig. 9. Examples of images with same concentration (30%) and different object size.

same range, which can be related to the achievement of the interface area value that corresponds to reach the random distribution. It is, therefore, plausible, to propose a model (Equation (6)) to better understand the obtained curves in stochastic distributions:

during a mixing process, until reach a limit value when achieve a completely mixed material [7], corresponding to a random distribution. Fig. 6 presents a profile of homogeneity curve, in which they are mainly different for the values of standard deviation concerning lower scales of scrutiny. When ratio of 0.2 is achieved, the sw decreases almost to the

Fig. 10. Homogeneity curve for different object sizes. 32

L. de Moura França et al.

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39

Fig. 11. Surface values for the model parameters in function of clump size and concentration; (a) P1, (b) P2 e (c) q1.

Table 1 Parameters values for different concentrations and clump size. Clump Size

Parameter

Concentration 1

5

10

15

20

25

30

35

40

45

50

1

P1 P2 q1 P1 P2 q1 P1 P2 q1 P1 P2 q1 P1 P2 q1 P1 P2 q1 P1 P2 q1

0,0008 0,1098 0,0952 0,0018 0,2347 1,3449 0,0032 0,3760 2,7736 0,0048 0,5346 4,4419 0,0071 0,7416 6,3904 0,0128 1,2309 11,3477 0,0243 2,2027 20,6053

0,0017 0,2408 0,0970 0,0039 0,5053 1,3216 0,0068 0,8101 2,7436 0,0105 1,1777 4,4349 0,0151 1,6042 6,3732 0,0271 2,6424 11,2120 0,0524 4,8519 20,9945

0,0024 0,3314 0,0969 0,0052 0,6853 1,3057 0,0092 1,0985 2,7080 0,0141 1,5826 4,3592 0,0204 2,1650 6,3089 0,0358 3,5526 11,0673 0,0687 6,4115 20,3851

0,0028 0,3947 0,0974 0,0063 0,8011 1,2798 0,0106 1,2844 2,6758 0,0163 1,8489 4,3067 0,0233 2,5144 6,1993 0,0405 4,0862 10,7705 0,0788 7,4345 20,0867

0,0031 0,4406 0,0938 0,0069 0,8873 1,2665 0,0118 1,4280 2,6680 0,0177 2,0383 4,2426 0,0254 2,7696 6,1278 0,0439 4,4726 10,5772 0,0843 8,0332 19,5051

0,0033 0,4770 0,0939 0,0073 0,9548 1,2620 0,0124 1,5151 2,6126 0,0189 2,1717 4,1880 0,0267 2,9288 5,9971 0,0465 4,7638 10,3694 0,0892 8,5598 19,3346

0,0036 0,5044 0,0928 0,0077 1,0017 1,2464 0,0130 1,5829 2,5715 0,0194 2,2609 4,1172 0,0275 3,0563 5,9200 0,0484 4,9807 10,2827 0,0898 8,7101 18,6166

0,0037 0,5238 0,0905 0,0079 1,0325 1,2242 0,0133 1,6413 2,5600 0,0198 2,3274 4,0613 0,0283 3,1431 5,8353 0,0485 5,0646 10,0279 0,0922 8,9705 18,4501

0,0038 0,5384 0,0912 0,0080 1,0562 1,2085 0,0134 1,6746 2,5245 0,0200 2,3640 3,9848 0,0284 3,1840 5,7206 0,0496 5,1861 9,9663 0,0916 9,0135 17,9579

0,0039 0,5503 0,0982 0,0080 1,0699 1,1896 0,0135 1,6928 2,4836 0,0201 2,4030 3,9570 0,0281 3,2024 5,6108 0,0487 5,1606 9,6694 0,0906 8,9984 17,5232

0,0039 0,5496 0,0914 0,0081 1,0768 1,1710 0,0135 1,6944 2,4329 0,0203 2,4129 3,8978 0,0281 3,2232 5,5463 0,0485 5,1741 9,5093 0,0874 8,8003 16,8119

2

3

4

5

7

10

Sw ¼

P1 :r þ P2 r þ q1

concentration. Therefore, the next logical step is to ascertain the parametrization when not only the concentration, but also the relative size changes in the image.

(6)

where P1 and P2 are the main parameters, sw the standard deviation value related to the window size and q1 a constant. In Fig. 7, an example for 10% concentration is depicted where the blue line represents the curve and the red dots are the fitting of the model developed. The proposed Equation (6), evidenced to be a suitable model which provides the homogenous target curve. Evaluating the parameters of Equation (6) for all plausible concentrations, P1 and P2 evolution presents a parabolic form, concerning a second order polynomial (y ¼ ax2 þbx þ c), with P1 having negative coefficient (a<0), P2 the positive, an opposite behavior, while q1 can be considered as a constant (Fig. 8). These results are very remarkable. First of all, the behavior of the curve from 1 to 50% is the same as for 51–100% of concentration, which is something expected for binary samples. But what is more important is the fact that the homogeneity curve can be parametrized at all levels of

4.3. Case 3: evaluating random distribution with object size and concentration variation To verify that the curve can be parametrized at different sizes of elements, simulations varying the object size and concentration were performed. The object sizes evaluated have a ratio (size object vs total size image) of 0.01, 0.02, 0.03, 0.04, 0.05, 0.07 and 0.10 for 11 concentrations, in the range of 1–50%, with 1000 simulation for each concentration. To build the homogeneity curve using CLMB, the windows size have the ratio of 0.01–0.90 squared pixels. Fig. 9 shows several examples of the simulated images. Fig. 10 shows the results of homogeneity curves for different object size at different concentrations. With bigger object size, high values of sw 33

L. de Moura França et al.

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39

Fig. 12. Simulated image to evaluation of heterogeneity in the concentration of 5 and 50% for a clump growing diagonally (a and b) and for a growing line (c and d).

Fig. 13. (a) homogeneity curve for different samples concentrations simulated for heterogeneity and (b) mean and modeled homogeneity curve.

4.4. Case 4: heterogeneous distribution at pixel level varying the concentration

are achieved concerning the individual particle value and the reduced number of particles, leading to deviation of the ideal homogeneity curve in cases of a fewer homogeneity distribution. The size variations of the objects promotes difficulties to determine the single particles, due the borders becomes diffused (mixing of different objects) [11], mainly with overlapping, being not suitable to verify if in one pixel have information of one or more objects. The model for random samples with higher ratios work better than with smaller ones. There can be appreciated a little increase of the values of sw with bigger windows size ratios. Observing the parameters (P1, P2 and q1) values (Fig. 11), higher object ratios provide higher values of constants, even maintaining the parabolic behavior for P1 and P2. The q1 parameter loses the constant shape when high object ratios are studied, having a very low curvature, that can be verified the variations in Table 1, related to the major magnitude values.

Heterogeneous images have been evaluated increasing the size of the elements as function of the concentration. The same 20 concentrations used in the case 2 have been evaluated, obtaining for each concentration level, the most heterogeneous distributions. Fig. 12 presents heterogeneous simulated images for 5 and 95% of concentration, changing the shape and size of the clump. Fig. 13a presents a high deviation of the homogeneity curves, considering the results of standard deviation (sw ). A mismatch of the normal pattern turns evident, considering the ideal homogeneity curve, presenting a high deviation with the low concentration samples while for higher concentrations, the homogeneity curve fits better to the ideal homogeneity curve. 34

L. de Moura França et al.

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39

Fig. 14. Adjusted parameters for developed model (Equation (6)) to heterogeneity samples.

Fig. 15. Adjusted parameters for developed model (Equation (5)) to heterogeneity samples.

Fig. 16. Images of different times of mixing and respective

modeled HC,

35

heterogeneous HC and

sample HC for the concentration of 10%.

L. de Moura França et al.

Fig. 17. Images of different times of mixing and respective 40% (bottom).

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39

modeled HC,

heterogeneous HC and

36

sample HC for the concentration of 20 (top), 30 (middle) and

L. de Moura França et al.

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39

Fig. 18. Images of different times of mixing and concentration a) time 4 for 10%; b) time 4 for 20%; c) time 5 for 30%; d) time 5 for 40%; and respective modeled HC, heterogeneous HC and sample HC and symmetric pattern with dashed green line and box. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

- The most homogeneous distribution of elements will be the average random distribution of the elements; - The most heterogeneous distribution will be any distribution in which the elements are completely grouped.

The heterogeneous images presented a high deviation between the mean and the modeled homogeneity curves (Fig. 13b), besides the model parameters shows scalar value deviation too (Fig. 14). The built model, using the results for heterogeneous simulated samples with different concentrations, presents a suitable shape for the homogeneity curve, even having a high fluctuation among different samples. The model parameters are a confident indicative that the model is appropriate for the analyzed homogeneity curves. However, analyzing the parameters values achieve by the model for heterogeneity, the presented parameters shows high scalar values, much bigger than zero, and the constant q1 loses the linear shape to present a parabolic shape (Fig. 14c). The results for heterogeneity simulated samples indicate that the model defined by Equation (6) cannot be used for homogeneity analysis, as result of a lack of fit in the model parameters.

where, the corresponding curves are raw (AUCsample), homogeneous (AUCh) and heterogeneous (AUChet). To verify the application of the %H approach, two assays with real samples have been evaluated.

4.5. Development of the homogeneity percentage

4.6. Case 5: mixing analysis with objects of defined size

The proposal of a homogeneity percentage emerged concerning the theory based on the fact of that the most homogeneous distribution at any pixel size level and at any concentration can be theoretically calculated by using the parameters of Table 1. Moreover, the heterogeneous distribution cannot be calculated with the same model and therefore, its corresponding curve must be calculated using another approach. In order to cover all possible distributions an individual and unique percentage of homogeneity is proposed. This approach is based in the fact that the homogeneity curve of any image must lay between the most homogeneous and the most heterogeneous distributions of its elements, by using the self-information carried by the image. For any individual binary image, at any pixel/image ratio and at any concentration of indivisible elements:

RGB images were taken with a digital camera of spheres of the same size in a box with a fixed size. The spheres simulated different mixing procedures at different concentrations (number of spheres) at 10, 20, 30 and 40%. The different pictures for each mixing time were evaluated by using the proposed methodology (Fig. 15). Fig. 16 shows the results obtained when the concentration of spheres is of 10%. The first image belongs to time 0 and illustrates the most heterogeneous distribution for this concentration, having all the objects concentrated in the corner of the image. Based on this, the black curve (inhomogeneous distribution curve) can be calculated. Nevertheless, we must stress that the advantage of this methodology is, indeed, that the inhomogeneous curve can be easily simulated without needing this first picture. According to the object concentration of the image, the curve

Therefore, a percentage of homogeneity (%H) can be calculated as in equation (7) by using the area under curve (AUC) of three curves:

%H ¼

37

  AUCsample  AUCinh  100 AUCh  AUCinh

(7)

L. de Moura França et al.

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39

Fig. 19. CLSM images with A) 3.5% w/w of protein and 5% w/w of fat content, and B) 1.5% w/w of protein and 0.5% w/w of fat content.

Fig. 20. Homogeneity (a) curves from all samples (object, homogeneous distribution in green, inhomogeneous distribution in red) and (b) from a sample with 95,7% and (c) the homogeneity percentage (%H) for all samples. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

corresponding to the most homogeneous disposition can be also calculated using the parameters in Table 1 (red curves in Fig. 16). These two

curves cover the boundaries of the percentage of homogeneity, being the red curve the 100% of homogeneity and the black curve the 0% of

Fig. 21. ANOVA results for %H against (a) protein level, (b) fat concentration (c) reference yoghurt. 38

L. de Moura França et al.

Chemometrics and Intelligent Laboratory Systems 171 (2017) 26–39

simulations provide some relevant points:

homogeneity. As the blending of the spheres progresses, the homogeneity curve at each time (blue curve) is getting closer to the red one (homogeneous). This shows how a percentage of homogeneity can be calculated for each one of the blending times. Time 02 presents a negative %H value (1%). This is something plausible, since the most inhomogeneous curve, as we have seen before, is calculated assuming one of the plausible inhomogeneous distributions. Results obtained for higher concentrations (20, 30 and 40%), are presented in Fig. 17. For these concentrations, higher %H values have been obtained, due to the higher proportion 1:1 (object/diluent) most suitable to achieve appropriate random distribution. Observing the images and the homogeneity curves, the increase of blending time provides HC towards to the model curve (target), i. e. to a higher degree of homogeneity, leading to most random distribution of the objects in the image. Apart of calculating a homogeneity percentage, the curve also offers structural information. Fig. 18 shows high Sw values even when HC values are leading to zero, a symmetric pattern appears presented by this raise in Sw value. The green dashed line in image provides a visualization of the symmetric pattern in the image and in the HC the disturbance are presented in the green dashed box, as discussed in case 1 (Homogeneous but not random distribution).

(i) A “perfect” non-stochastic distribution (symmetric patterns) do not provide the best homogeneity distribution. On the contrary, the random distribution is associated to the ideal homogeneous distribution. (ii) The smaller the object size/image size ratio is, the closer to an ideal homogeneous situation can be achieved. (iii) A proposed model (Equation (6)) is suitable only when homogeneous distributions are studied. The previous points allow to theoretically calculate the ideal homogeneous image (and, thus, its homogeneity curve) knowing the concentration of elements and the relative size of the elements with the size of the image. Moreover, the most heterogeneous situation can be postulated for every situation and its corresponding curve calculated as well. With these two curves from an image, plus its own homogeneity curve, a homogeneity percentage can be calculated. (iv) A homogeneity percentage (%H) proposed herein demonstrates to be a reliable measurement for evaluating the homogeneity in binary images, being an easy and user-friendly method to use in practical evaluation analysis of, for instance, blending processes or mixing issues. This %H has been successfully evaluated for both simulated and real images.

4.7. Case 6: yoghurt structure analysis

Funding agencies

A total of 244 CLSM images of different yoghurt samples have been evaluated to verify the homogeneity in terms of protein content. Different formulations of full and low fat yoghurts without fat replacer, of one commercial brand and three lab formulations have been used for this purpose [10]. The images were binarized, using a dynamic threshold to provide only the object information for the analysis. An example of the images treated is depicted in Fig. 19. The homogeneity curves were calculated for the raw, its randomized (homogeneous) and heterogeneous image for all the 244 the samples (Fig. 20a), the AUCh and AUChet acquired as mentioned before. Fig. 20b shows the three homogeneity curves of a single sample with 95.7% of homogeneity. To assure that the most representative curve of the homogeneous situation was achieved, each one of the images was randomized 200 times and the homogeneity curve calculated over the mean curve. The parameters of the homogeneous curves to assess the randomization step were calculated and an ANOVA analysis using values of level of protein, fat concentrations, and reference yoghurt against the %H was calculated. This demonstrated to have significant differences between the different effects. As results, in respect to the formulations used, increases on the level of protein the homogeneity percentage values reduce (Fig. 21a); high concentration of fat promotes a better homogeneity result (Fig. 21b); and the reference yoghurt have better homogeneity in relationship to the manipulated ones (Fig. 21b).

NUQAAPE – FACEPE (APQ-0346-1.06/14); INCTAA (Processes nº: CNPq 573894/2008-6; FAPESP 2008/57808-1), Núcleo de Estudos em Química Forense -NEQUIFOR/CAPES (Process no. 3509/2014) and CNPq (Process no. 400264/2014-5). Acknowledgment Acknowledging that most of the real-life situations do not concern binary images, but grayscale, the second part of this study will be focused on the adaptation of this proposed homogeneity percentage to grayscale images. References [1] W.K. Pratt, Image feature extraction, in: W.K. Pratt (Ed.), Digit. Image Process. PIKS Insid, third ed., John Wiley and Sons, Inc., New York, USA, 2001, pp. 509–550. [2] R.M. Haralick, K. Shanmugam, I. Dinstein, Textural features for image classification, IEEE Trans. Syst. Man. Cybern. 3 (1973) 610–621, https://doi.org/10.1109/ TSMC.1973.4309314. [3] M.H. Bharati, J.F. Macgregor, Texture analysis of images using principal component analysis, in: SPIE/Photonics Conference on Process Imaging for Automatical Control, Boston, 2000, pp. 27–37. [4] E. Aptoula, S. Lefevre, Morphological texture description of grey-scale and color images, in: P. Hawkes (Ed.), Adv. Imaging Electron Phys, Academic Press, 2011, pp. 1–74, https://doi.org/10.1016/B978-0-12-385981-5.00001-X. [5] M.H. Bharati, J.J. Liu, J.F. MacGregor, Image texture analysis: methods and comparisons, Chemom. Intell. Lab. Syst. 72 (2004) 57–71, https://doi.org/ 10.1016/j.chemolab.2004.02.005. [6] J.-M. Missiaen, G. Thomas, Homogeneity characterization of binary grain mixtures using a variance analysis of two-dimensional numerical fractions, J. Phys. Condens. Matter 7 (1995) 2937–2948, https://doi.org/10.1088/0953-8984/7/15/002. [7] P.M.C. Lacey, Developments in the theory of particle mixing, J. Appl. Chem. 4 (1954) 257–268, https://doi.org/10.1002/jctb.5010040504. [8] M.L. Hamad, C.D. Ellison, M.A. Khan, R.C. Lyon, Drug product characterization by macropixel analysis of chemical images, J. Pharm. Sci. 96 (2007) 3390–3401, https://doi.org/10.1002/jps. [9] N. Sarkar, B.B. Chauduri, Efficient differential box-counting approach to compute fractal dimension of image, IEEE Trans. Syst. Man. Cybern. 24 (1994) 115–120, https://doi.org/10.1109/21.259692. [10] I.C. Torres, J.M. Amigo Rubio, R. Ipsen, Using fractal image analysis to characterize microstructure of low-fat stirred yoghurt manufactured with microparticulated whey protein, J. Food Eng. 109 (2012) 721–729. [11] P.V. Danckwerts, The definition and measurement o some characteristics of mixture, Appl. Sci. Res. 3 (1952) 279–296.

5. Conclusions In this first part, the definitions of homogeneous and heterogeneous distribution of elements on binary images are comprehensively revisited. In addition, a new index is proposed to calculate the percentage of homogeneity of an image by only using self-information contained on the image. The homogeneity is evaluated by considering all the mixing theories that lead to assess the ideal and non-ideal homogeneous distribution. The evaluation has been made using the macro-pixels theory, giving a socalled homogeneity curve (HC) and calculating this curve to perfectly controlled and known situations. The HC also demonstrates to present features concerning deviations from the ideal situation within an image. Therefore the Lacey theory is demonstrated and taken as the situation in which an image is ideally homogeneous. The presented results for all 39