Multispectral skin patterns analysis using fractal methods

Multispectral skin patterns analysis using fractal methods

Accepted Manuscript Multispectral skin patterns analysis using fractal methods Karol Przystalski, Maciej J. Ogorzałek PII: DOI: Reference: S0957-417...

13MB Sizes 0 Downloads 49 Views

Accepted Manuscript

Multispectral skin patterns analysis using fractal methods Karol Przystalski, Maciej J. Ogorzałek PII: DOI: Reference:

S0957-4174(17)30480-3 10.1016/j.eswa.2017.07.011 ESWA 11427

To appear in:

Expert Systems With Applications

Received date: Revised date: Accepted date:

24 September 2016 4 July 2017 10 July 2017

Please cite this article as: Karol Przystalski, Maciej J. Ogorzałek, Multispectral skin patterns analysis using fractal methods, Expert Systems With Applications (2017), doi: 10.1016/j.eswa.2017.07.011

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights • A solution of skin cancer pattern recognition using fractal methods on multispectral images is proposed. • Lacunarity and box dimension is used to get the pattern characteristic. • Multiple image binarization methods are tested together with different classification methods.

AC

CE

PT

ED

M

AN US

CR IP T

• Proposed solution shows that fractal methods can be combined with binarization methods for some skin cancer patterns.

1

ACCEPTED MANUSCRIPT

Multispectral skin patterns analysis using fractal methods Karol Przystalskia,∗, Maciej J. Ogorzałeka a Jagiellonian

University, Łojasiewicza 11, 30-348 Krak´ow, Poland

Abstract

AN US

CR IP T

Melanoma is widely known as one of the most dangerous cancers. Over the past few decades, technological improvements have made it possible to introduce more advanced diagnostic tools for melanoma. Unfortunately, even though better tools are available, diagnosis accuracy is still unsatisfactory. Hundreds of papers have been published containing ideas on how to improve melanoma diagnosis accuracy, including a range of imaging and image analysis techniques. Some of the best diagnosis results are obtained using multi-level SIAscope images, but even with this method there is still room for further improvement. In this paper, we propose the use of additional discriminative features such as box dimension and lacunarity calculated based on a multilevel image database. The goal of this paper is to show the usefulness of fractal methods used with multilevel images and binarization methods in skin cancer pattern recognition. The results were compared to an assessment of each feature of Hunter’s scoring method, which is commonly used as a diagnostic indicator by doctors. The results indicate the usefulness of the fractal characteristics of the geometric shapes of lesions or specific parts of them. Compared to other research, the presented results clearly indicate that fractal lesion characteristics can be used as one of the features taken into account in the diagnostic process. Keywords: medical imaging, fractal analysis, malignant melanoma, spectrophotometric, skin images, dermoscopy, support vector machine, neural networks, image binarization, SIAscope, fractals, fractal dimension, box dimension, lacunarity, multilevel images, multi-layered images, skin cancer, skin lesions

M

1. Introduction

AC

CE

PT

ED

For many years researchers have tried to help doctors achieve better accuracy in melanoma diagnosis; however, the latest research indicates that improvement is still needed. In (Menzies et al., 2005) GPs achieve an accuracy of diagnosed benign and malignant skin lesions of 62% and 63% respectively, whereas experts can achieve an accuracy of 90% and 59%, respectively. These results indicate that diagnosis accuracy is unsatisfactory. Currently, researchers are focused on introducing better devices (Dubovitskiy et al., 2014; Afifi et al., 2016) and diagnostics methods (Korotkov & Garcia, 2012; S´aez et al., 2015). In this paper we use the well-known fractal and binarization classification methods on multispectral skin lesion images. Existing research indicates that successful prediction of melanoma patterns can increase the accuracy of melanoma recognition. Past researches indicated usefulness of fractal methods usage for melanoma pattern detection (Maier et al., 2015; Kruk et al., 2015). In most papers the lesion shape, color or irregularity is measured using fratal methods(Manousaki et al., 2006), but compared to (Menzies et al., 2005) the accuracy is still on a unsatisfactory level. We investigate more patterns than just the shape, color or irregularity. We compare our results to other research that also focuses on melanoma pattern recognition (Sadeghi et al., 2013; Celebi & Zornberg, 2014). Almost ∗ Corresponding

author Email addresses: [email protected] (Karol Przystalski ), [email protected] (Maciej J. Ogorzałek) Preprint submitted to Expert Systems with Applications

all previous research on melanoma patterns was based on dermoscopy images (S´aez et al., 2014). On the other hand, some papers show the importance of using multispectral/spectrophotometric images for melanoma analysis (Hacioglu et al., 2013; D’Alessandro & Dhawan, 2012; Tomatis et al., 2005). Therefore, in our research we consider the patterns used in the Hunter score (Hunter et al., 2006). This paper is divided into several sections. In the first, we explain the medical aspects of the issue. The main part is dedicated to the Hunter score (Hunter et al., 2006). Of the standard methods for skin cancer diagnosis used by doctors, most are based on analysis of characteristic patterns of melanomas. The currently most popular diagnostic method was introduced by (Stolz et al., 1994) and is often referred to as the ABCD score. Some algorithms have been developed to enable computer-assisted detection of ABCD features (Pires & Barcelos, 2007; Lee & Claridge, 2005; Celebi et al., 2008). Other methods such as 3point (Soyer et al., 2004), CASH (Henning et al., 2007), ChaosClue (Kittler et al., 2008), 7-point (Argenziano et al., 2011) and Menzies (Menzies et al., 1996) scores treat melanoma diagnosis in a similar way. Most of these methods are based on the various shape/structure patterns that indicate melanoma. A better list of patterns can be found in (Soyer et al., 2010). Compared to other scoring methods, only the Hunter score is used with SIAscope and multispectral skin images. This is why the patterns used in this scoring method are considered in this paper. The next section is dedicated to the process and is divided into three parts: image binarization, feature extraction, and classification. Compared to many other research papers (AbuzaghJuly 12, 2017

ACCEPTED MANUSCRIPT

regular dermatoscope image

dermal melanin layer

melanin layer

haemoglobin layer

CR IP T

collagen layer

light waves

melanin

collagen blood

AN US

Figure 1: Multilayer images obtained with SIAscope

AC

CE

PT

ED

M

leh et al., 2015; Abbadi & Miry, 2014) in which only one or a few binarization methods are used, to get the best possible results we tested the 37 most popular binarization methods. Binarization is performed for each image with each method, so we get 37 binary images for each considered image from table 2. The classification features are based on fractal methods. In our research we used two popular methods: box-counting dimension and lacunarity. Fractal methods for pattern recognition have been applied in previous research (Chen et al., 1993) and have been shown to be useful in melanoma pattern recognition (Kockara et al., 2015). Even though fractal methods have given good results in melanoma pattern recognition, they have never been used on multispectral images. Classification is performed using two methods that are known to give good results in image-based pattern recognition. In the results section, we show classification accuracy, specificity, and sensitivity of the pattern recognition process. For comparison, we also present the results of Hunter semi-score classification prediction. As a performance indicator, we show the AUC accuracy value for each Hunter score pattern. We also present a list of the binarization methods that achieve the best results. At the end, we compare our results against past research. We show that in a few cases, the proposed approach gives better results than the methods studied in previous research. Overall, the results show that combining fractal methods with binarization methods, commonly used classification methods, and multispectral skin lesion images can offer improved melanoma diagnosis. 2. Medical aspects of this paper Dermoscopy has evolved in the last twenty years. Doctors have moved from classic dermatoscopes (Heine et al., 2004) to much more advanced devices such as video dermatoscopes

3

(Meyer, 1998), mobile dermatoscopes (Mullani, 2004), or devices like SIAscope (Cottton et al., 2009; Moncrieff et al., 2002). Apart from the Hunter score, the aforementioned scoring methods use regular dermatoscopy images of skin lesions. Such images can be taken using a classic mobile dermatoscope or video dermatoscope. SIAscope is one of the most advanced devices used for skin cancer diagnosis as it acquires five images of a lesion using different light wavelengths during an examination. The process of image creation in SIAscope is explained in figure 1. A regular dermatoscope image has a resolution of 1400x1400 pixels, but SIAscope images are 672x672 pixels. Using SIAscope, we are able to obtain a regular dermatoscopy view and images of hemoglobin, melanin, collagen, and dermal melanin. These images are possible because different types of cell differently respond to different light wavelengths. Based on five different images, SIAscope uses MoleMate software to examine lesions using the Hunter score. In MoleMate, a few simple questions that are mostly based on patterns specific to melanoma or other skin diseases are used to obtain the Hunter score. In figure 2 some patterns contributing to the Hunter score are presented. In figure 2a pattern of bright dots is shown in a regular dermoscopy image; it can also be shown on the collagen layer. In figure 2b the pattern looks like brain folds. This structure can be recognized in the melanin layer. The third pattern presented in figure 2 is a large red object in the hemoglobin layer. In figure 2d a dermal melanin pattern is shown. A darker color is visible when changes occur deeper in the skin. In figure 2e blood vessels are shown in the hemoglobin layer. The last presented pattern in figure 2f shows blood displacement in a regular dermatoscopy image. In the hemoglobin layer, it is a white object surrounded with solid red. The Hunter scoring method applies two questions in addition to the presence of various patterns in the image. The first concerns the size of the lesion—a common feature that also is used in other diagnostic methods. If a lesion is bigger than 6mm in any dimension, the

ACCEPTED MANUSCRIPT

(b) Brain pattern

(d) Dermal melanin

(e) Blood vessels

(c) Blood lacunes

AN US

CR IP T

(a) Bright dots

(f) Blood displacement with blush

Figure 2: Melanoma patterns obtained using different light wave lengths: a) and f) classic dermatoscopy view using visible light, b) melanin view obtained with ultraviolet light, c) and e) hemoglobin view obtained with near-infrared light, d) dermal melanin obtained with near-infrared, ultraviolet and visible light

ED

M

Hunter score is higher. The last question is related to the patient’s age: the Hunter score depends on the age range. Age is scored from 0 to 6 points. Older patients get a higher score because the probability of a skin cancer increases with age. The database used in this research does not contain any kind of data about patient age. Therefore, in this research we consider only a score of seven features and call it a Hunter semi-score.

PT

3. Melanoma patterns shape detection

CE

The process of finding and assessing features from multispectral skin images consists of several steps. First, RGB images of hemoglobin, collagen, melanin and the dermal melanin layer are converted to binary images. In figure 5 the difference between benign and malignant lesion shape is shown. The shape of a malignant lesion is much more irregular than that of a benign lesion. Next, fractal methods are used to obtain boxcounting dimension and lacunarity of each binary image. A vector of two features is used for binary classification of each pattern mentioned in the previous section. For Hunter score multiclass classification, F-score (Chen & Lin, 2006) is used to obtain the most valuable features. We used binary images instead of grayscale for performance reasons. The goal of this research is to develop very fast methods that could be used in the future on mobile devices. Therefore we used fractal methods based on binary images as calculations for this type of images is usually much faster. The tests of our approach indicate that calculations of box-counting dimension and lacunarity is up to three times faster for binary images than for their grayscale ver-

sions. Grayscale based methods are under consideration for use in our further works. 3.1. Image binarization The classification is performed on a vector of fractal-based features which are calculated based on binary images. In most papers only few binarization methods are tested. To get the best results in this research we tested many binarization methods. We chose methods based on existing surveys of binarization methods (Sezgin & Sankur, 2004), removing all obsolete methods and adding some common ones. Some groups of methods use a similar approach, therefore the methods can be divided into the following groups: • histogram shape-based thresholding

AC

• cluster-based thresholding • entropy-based thresholding • local adaptive thresholding • attribute similarity thresholding • spatial thresholding In this paper, we refer to each method with the name of the first author that introduced it. All images of melanin, hemoglobin, dermal melanin and collage are binarized using each of the above methods. For example, for collagen-based patterns, 629 binary images were generated by each binarization method. The 4

ACCEPTED MANUSCRIPT

(e) Collagen binary image of dysplastic nevus

PT

ED

M

(d) Haemoglobin binary image of dysplastic nevus

(c) Dermal melanin binary image of dysplastic nevus

CR IP T

(b) Melanin binary image of dysplastic nevus

AN US

(a) Dysplastic nevus

(g) Melanin binary image of malignant lesion

(h) Dermal melanin binary image of malignant lesion

AC

CE

(f) Malignant melanoma

(i) Hemoglobin binary image of malignant lesion

(j) Collagen binary image of malignant lesion

Figure 3: An example comparison of binary images of benign (a-e) and malignant (f-j) lesions. Images (b-e,g-j) were obtained using the Shanbhag (Shanbhag, 1994) binarization method. Even with visual inspection, clear differences between the binarized structures in the two cases are visible: in the malignant case (g-j) compared to the benign case (b-e) the dark structures are fragmented and show complex geometric structures

5

ACCEPTED MANUSCRIPT

(b)

ε 2

(c)

ε 4

CR IP T

(a) ε

Figure 4: Box counting of lesion binary image example for three different box sizes: ε, 2ε ,

quality and performance of a binarization method can be determined in few ways. Some performance metrics are described in (Sezgin & Sankur, 2004). In our research, it is not possible to measure which pixels are wrongly assigned to foreground or background as we do not have valid output images. We measure the quality of a method by measuring accuracy, specificity, and sensitivity of classification output. Another quality metric used by us is the execution time. Execution time for most methods is almost the same. Some methods are faster, such as (Shanbhag, 1994) or (Otsu, 1979), but other methods like (Huang & Jang, 1995) are as much as 30% slower than (Otsu, 1979). In the next chapter, the binarization method that gives the best results for each classification pattern is shown in table 4. Some example binary images of dysplastic and malignant lesions are shown in figure 3. In figures 3b and 3g, two melanin layer binary images are shown. For a dysplastic nevus image, the object is solid. A malignant lesion has two objects with much more irregular borders. A dermal melanin appears as a bigger object in a dysplastic lesion (see figure 3c). A malignant lesion has many small objects visible in the dermal melanin layer (see figure 3h). A benign lesion can have an object with a hole inside the hemoglobin layer (see figure 3d). The malignant hemoglobin layer in this example is very similar to dermal melanin (see figure 3i) and the collagen layer (see figure 3j).

ε 4

Table 1: Average standard error and correlation for Hunter patterns

Pattern

Standard error

Correlation

0.1113 0.0358 0.0250 0.0254 0.0533 0.0957 0.0274

0.9806 0.9997 0.9999 0.9999 0.9663 0.9742 0.9998

PT

ED

M

AN US

Bright dots Brain pattern Blood lacunes Dermal melanin Blood vessels Blood displacement Size

of 1x1, 2x2, 4x4, and so on up to 64x64 pixels. We have in total seven different box sizes. An example of box counting method with three different box sizes is shown in figure 4. In the case of box-counting dimension, we count filled boxes and compare them to all boxes for a given box size. The lacunarity method is similar to box-counting dimension, but instead of counting all filled pixels we compare the box to neighboring boxes. This allows us to measure how many holes are presented in the image. We can calculate lacunarity as follows: λε = CVε2 =

σε , µε

(2)

CE

where σε is the standard deviation, µε is the mean of filled pixels count of a given shape that fill a box of given size ε. As in box counting, we calculate CVε2 for seven different box sizes. Lacunarity is the average of all box sizes:

AC

3.2. Fractal methods In this research, box-counting dimension(Chen et al., 1993) and lacunarity (Dong, 2000; Gilmore et al., 2009) are used for numeric shape representation. Based on binary images, we calculate box-counting dimension. Box-counting dimension is calculated as the slope of linear regression of successive approximations of the shape using scaled versions of boxes: P P n SC − SC Dp = P 2 P 2 , (1) n S − C

¯ = Λ

N X λi i=1

n

,

(3)

Box sizes are limited to 64x64 pixels as for bigger box sizes standard error is too big to be considered reliable. The standard error is calculated as follows: s P 2 P P C − b C − Dp S C SE = , (4) n−2

where n is the number of box sizes, C = log N(ε), S = log ε, ε is the box size in pixels, N(ε) is the number of boxes filled with the binary image of the shape. The size of hemoglobin, collagen, dermal melanin, and melanin layer images is 672x672 pixels. The slope is calculated based on a square box with a size 6

where b is calculated as following: P logN(ε) − D p logεlogN(ε) b= . n

(5)

ACCEPTED MANUSCRIPT

(b) Malignant lesion

CR IP T

(a) Benign lesion

Figure 5: Shape differences of benign and malignant lesion on zoomed binary image. Shanbhag (Shanbhag, 1994) method used for binarization

The second quality metric used in this research is correlation, which should be close to 1.0 to be considered reliable. It can be calculated as follows: P P P (n S C − S C) r2 = ( p P )2 (6) P P P (n S 2 − S 2 )(n C 2 − ( C 2 )

g(x) =

M

ED

αi ti xi .

where the RBF kernel is defined as follows: ||xi − x||2 K(xi , x, σ) = exp(− ). (11) 2σ2 In this research on pattern classification, the feature vector consists of two values, which is why there are only two input nodes in the neural networks. The neural network consists of one hidden layer with four nodes and is a feed forward (FNN) type (Ercal et al., 1994). The kernel function is used in the same way as in the case of SVM. The decision function is given as follows: n X w j K j (xi , x, σ). (12) F(x) = j=1

PT

CE

AC

w=

(10)

We use the so-called π method (Kuncheva, 2014). We divided the database into training and testing data in percentage ratios of 10/90, 50/50, 60/40, 70/30 and 80/20, with randomly chosen data in each group. Each ratio is repeated 100 times with randomly chosen training and testing data each time. For example, for the size pattern in a 50/50 scenario, we always randomly choose 105 lesions without the pattern and 201 with the pattern. At the end, the accuracy metrics are calculated as an average of 100 classification results of each ratio. The main metric is the accuracy, calculated as follows:

(7)

TN + TP . (13) T N + T P + FN + FP We use also specificity (shown in red in table 4) and sensitivity (shown in green in table 4) of pattern classification. Both quality metrics can be calculated as follows: accuracy =

where w is the weights that need to be set while training: Ns X

αi ti K(xi , x, σ) + w0 ,

i=1

3.3. Classification methods In this research, we consider two different classification problems. Hunter score pattern detection is a binary classification problem whose goal is to detect if a pattern exists within a lesion. A list of detected patterns is shown in table 2. The input feature vector consists of two fractal-based features. The second type of classification concerns the Hunter semi-score based on multilevel image-based features. By semi-score, we mean a score that can be achieved only by classifying the patterns shown in table 2. The input feature vector consists of seven fractal-based features. The goal is to get a semi-score from zero to seven. A general good practice is to use at least two different classification methods as there is no single best classifier. Depending on the considered problem, each classifier works differently. The Support Vector Machine method is known to give good results when the feature vector is based on images (Wu et al., 2008). The goal is to find a hyperplane with a maximum margin separating cases characterizing two classes. The hyperplane equation for linear separation is given as follows: g(x) = wT x + w0 ,

Ns X

AN US

The average calculated for each pattern is shown in table 1.

The hyperplane equation changes to the following state:

(8)

i=1

αi are the Lagrangian multipliers used to reduce the computation time (Smith). N s is the number of support vectors. Three types of kernel were tested and a radial basis function kernel gave the best results. For classification, C-SVM is used. In the case of the RBF kernel (Park & Sandberg, 1993) weights are calculated as follows: Ns X w= αi ti K(xi , x, σ). (9)

specificity =

TN , T N + FP

(14)

TP . (15) T P + FN The last performance metric used for binary classification in this research is the AUC value of the ROC accuracy curve. For a multiclass classification problem, we present a confusion matrix. sensitivity =

i=1

7

ACCEPTED MANUSCRIPT

12 lesions scored 5 and 25 scored 4. The database contains 12 lesions that have been diagnosed and confirmed by a pathology report to be malignant melanoma.

No. of lesions with a pattern

No. of lesions without a pattern

Bright dots Brain pattern Blood lacunes Dermal melanin Blood vessels Blood displacement Size

106 139 116 503 97 114 211

523 475 574 389 593 576 403

Collagen displacement Collagen excess

347 19

282 610

Pattern

4.2. Hunter score patterns classification We measured the overall classification accuracy, specificity, and sensitivity of our classifier. SVM classification results are presented in table 4. In the first column, classified patterns are presented. As mentioned previously, for each pattern more than 37 binarization methods were used. Our research indicated that some binarization methods are better than others. The method that gives the best accuracy was introduced by (Shanbhag, 1994). It gives the highest accuracy for bright dots, brain pattern, blood vessels and collagen displacement. The method introduced in (Kapur et al., 1985) gives the highest accuracy for dermal melanin, blood displacement, and size patterns. For collagen excess and blood lacunes, the method introduced in (Glasbey, 1993) gives the best results. Each pattern classification was performed for a few different training runs to test data ratio. In the second column, we show the results for the training data set of 10% of the database. At the same time, the testing data is 90% of the database, as shown in table 2. In the next four columns, we present classification results for different training/testing ratios. Accuracy for each pattern is marked blue. Classification sensitivity for a given pattern is marked green. Classification specificity for a given pattern is marked red. Overall classification accuracy is in most cases above 90%. For patterns such as brain pattern, collagen displacement and size, accuracy is below 90%. On the other hand, even the three lowest accuracies are above 80%. The lowest accuracy is given for brain pattern. The sensitivity of pattern classification is equal to or above 90% for patterns such as dermal melanin, blood vessels, blood displacement, size and collagen displacement. Sensitivity below 90% but above 80% is given for two patterns: bright dots and blood lacunes. Only for brain pattern and collagen excess is sensitivity below 80%. The lowest sensitivity is given for brain pattern. Specificity is high for almost all patterns and is only below 90% in the case of collagen displacement, brain pattern and size. The lowest specificity is given for size pattern. In table 5 neural network classification results are shown. Only patterns with a good result are shown. Neural networks give worse classification results than SVM. No pattern had accuracy above 90%. In all cases, it is around 80% or above, but never above 88%. We can observe an overfitting problem for some patterns for a data ratio of 80/20. In the case of SVM, this can be observed for collagen excess and brain patterns. This happens more often for FNN and can be observed for patterns such as brain pattern, dermal melanin, blood displacements, and size. The differences in accuracy in the aforementioned cases are insignificant for 70/30 and 80/20 data ratios. In table 6 and figure 6 the performance of the SVM classifier is shown. Patterns such as blood lacunes, dermal melanin and size are easily recognized. Brain pattern and blood displacements are recognized well. The most problematic patterns are

CR IP T

Table 2: Database of images grouped by Hunter score patterns

4. Results

AN US

This section is divided into a few parts. In the first, the database used for classification is presented. The next and most important part covers the experimental results of binary pattern classification and Hunter semi-score classification. 4.1. Database

AC

CE

PT

ED

M

The database is provided by a distinguished dermatology institute in Krakow. It is not a public database and there are no open SIAscope image databases available so far. It consists of about 960 lesions collected over 2 years with a SIAscope. For each lesion a Hunter score, MoleMate and dermatologist comment is given. The database was filtered and we removed all images for which the dermatologist was unable to distinguish if it was a benign or malignant lesion because the image was underexposed or contained hairs . We kept images where the lesion is visible. Finally, we had the database of images shown in table 2. In almost all cases there are more lesions without a pattern than with. Dermal melanin is the only pattern for which the number is higher for lesions that have it than for those that do not. In our database, we have only 97 lesions with blood vessels. The two last patterns in table 2, both of which were found during this research, are not Hunter score patterns. Both patterns are new, but there is no proof they have a strict relation with malignant melanoma. In this research, we came to understand that collagen displacement and collagen excess patterns do not appear together with bright dots. This should improve the classification results in further research. Table 3: Database grouped by Hunter semi-score

Hunter score

0

1

2

3

4

5

Count of lesions

65

144

106

64

25

12

The same database as shown in table 2 can be divided by Hunter’s semi-score (see table 3). As previously described, we were not able to obtain the full Hunter score . There are only 8

ACCEPTED MANUSCRIPT

Table 4: C-SVM pattern classification accuracy (marked blue), sensitivity (marked green) and specificity (marked red)

89.7% 69.0% 93.7% 81.7% 66.2% 84.1% 92.3% 80.2% 94.5% 94.2% 95.6% 92.4% 90.3% 76.6% 91.6% 90.1% 83.2% 90.9% 83.8% 95.5% 80.9%

92.0% 78.1% 94.4% 82.2% 67.0% 84.6% 92.3% 79.9% 94.6% 94.1% 95.5% 92.4% 91.8% 85.7% 92.3% 92.1% 92.0% 92.1% 83.9% 95.4% 81.1%

93.4% 83.8% 95.1% 82.2% 68.2% 84.4% 92.7% 81.2% 94.8% 94.5% 95.3% 93.4% 92.6% 90.0% 92.8% 92.6% 94.7% 92.3% 84.1% 95.6% 81.2%

76.8% Collagen 80.7% displacement 73.4% 97.0% Collagen excess 59.6% 97.8%

85.6% 88.0% 82.9% 97.7% 75.2% 98.0%

87.4% 88.9% 85.8% 97.9% 77.5% 98.3%

89.1% 90.5% 87.3% 98.0% 79.4% 98.4%

89.7% 91.6% 87.4% 97.9% 78.5% 98.4%

Dermal melanin Blood vessels Blood displacement Size

Collagen displacement

1

78.0% 85.9% 83.0%

78.0% 85.9% 83.0%

78.4% 85.9% 85.5%

78.5% 87.3% 86.6%

78.3% 87.4% 86.5%

83.0%

85.4%

85.7%

85.9%

85.9%

97.5%

79.5%

79.5%

81.1%

81.0%

75.9%

79.5%

80.1%

80.6%

80.9%

0,6

0,8

1,0

0,8

M

Blood lacunes

ED

Brain pattern

Brain pattern Blood lacunes Dermal melanin Blood displacements Size

CR IP T

85.5% 56.7% 91.4% 81.3% 64.4% 83.8% 91.2% 75.9% 94.2% 93.7% 95.5% 91.6% 87.2% 57.2% 90.3% 86.8% 64.5% 89.9% 83.3% 94.0% 80.6%

AN US

78.6% 33.5% 86.1% 78.9% 56.1% 83.0% 88.4% 66.4% 92.8% 92.2% 95.2% 89.1% 81.1% 27.3% 87.4% 80.0% 32.6% 85.7% 82.3% 93.1% 79.8%

Bright dots

Classification results by test/train ratio 10/90 50/50 60/40 70/30 80/20

Pattern

sensitivity

Pattern

Table 5: FNN pattern classification accuracy

Classification results by test/train ratio 10/90 50/50 60/40 70/30 80/20

0,6

0,4

0,2

0 0,0

PT

bright dots and blood vessels. Classification of blood vessels is the least accurate. The third column also shows the best binarization methods. Only three binarization methods give the best accuracy: Shanbhag (Shanbhag, 1994), Kapur (Kapur et al., 1985) and Glasbey (Glasbey, 1993).

0,2

0,4

1−specificity

CE

Figure 6: SVM based ROC accuracy examples of analyzed patterns marked as: blue – bright dots, green – brain pattern, gray – blood lacunes, orange – dermal melanin, black – blood vessels, red – blood displacement, cyan – size

AC

4.3. Hunter semi-score multiclass classification For pattern classification, SVM gives better results. Therefore, we used SVM for comparison to classify the Hunter semiscore. This is a multiclass classification problem. We used all features and all patterns used in the previous classification. Based on F-score, some features are more valuable than others. We used seven most valuable features: lacunarity and boxcounting dimension of the size, dermal melanin, blood lacunes and lacunarity of blood displacement. The kernel and SVM method types are used in the same way as in pattern classification. In fact, Hunter semi-score classification gives worse results than any other pattern classification in this research. The results are presented in table 7. Classification accuracy starts from only 36% if 10% of the database is used for training, rising to 69.2% if 80% of the database is used for training. All

Table 6: ROC AUC values for SVM accuracy and best binarization method for each Hunter score pattern

Pattern Bright dots Brain pattern Blood lacunes Dermal melanin Blood vessels Blood displacement Size

9

AUC value

Binarization method

0.6275 0.7816 0.9399 0.9839 0.5922 0.7955 0.9095

Shanbhag Shanbhag (1994) Shanbhag Shanbhag (1994) Glasbey Glasbey (1993) Kapur Kapur et al. (1985) Shanbhag Shanbhag (1994) Kapur Kapur et al. (1985) Kapur Kapur et al. (1985)

ACCEPTED MANUSCRIPT

Table 7: Classification results of Hunter’s semi-score grouped by test/train ratio

Table 8: Hunter semi-score confusion matrix

Classification results by test/train ratio 10/90 50/50 60/40 70/30 80/20 36.0%

57.6%

62.3%

65.7%

0

69.2%

5

Ratio 50/50 score 0 score 1 score 2 score 3 score 4 score 5

mistakes illustrated by percentage values are shown in table 8. For patients, scores 3, 4 and 5 are most important. For a data ratio of 50/50, the accuracy is highly unsatisfactory, especially for 3, 4 and 5 scores. For ratios of 60/40 and 70/30, accuracy is still unsatisfactory. Only for a data ratio of 80/20 was accuracy on a par with that of doctors(Menzies et al., 2005), especially for scores 0, 1 and 5, for which accuracy is about 75%. For the other scores, the accuracy level is about 60%.

61.6% 9.9% 5.1% 5.6% 0.7% 0.0%

21.5% 69.5% 32.3% 30.9% 14.0% 50.0%

10.6% 14.7% 51.7% 12.8% 16.9% 10.0%

5.5% 5.3% 8.1% 49.1% 10.0% 5.0%

0.0% 0.5% 2.2% 1.3% 37.7% 0.0%

0.8% 0.1% 0.6% 0.3% 0.0% 35.0%

3.9% 4.4% 6.5% 47.4% 4.0% 2.0%

0.0% 1.6% 1.6% 3.1% 38.0% 0.0

0.0% 0.2% 0.0% 0.7% 0.0% 52.0%

5.0% 3.3% 4.4% 57.9% 3.8% 5.0%

0.0% 0.5% 3.2% 1.1% 40.0% 0.0%

0.0% 0.0% 0.0% 0.0% 0.0% 50.0%

1.5% 4.8% 5.2% 61.5% 0.8% 0.0%

0.0% 0.4% 4.3% 0.0% 58.0% 0.0%

0.0% 0.3% 0.5% 0.8% 0.0% 75.0%

CR IP T

Hunter semiscore value

Classification results by test/train ratio and score value: 1 2 3 4

Label

Ratio 60/40

score 0 score 1 score 2 score 3 score 4 score 5

5. Conclusion

22.3% 76.9% 29.0% 32.7% 48.0% 36.0%

AC

CE

PT

ED

M

AN US

The main conclusions drawn from the experiments performed so far are the following: First of all, we have proved that using fractal methods with binarization methods and SIAscope images can be useful for automated recognition of some patterns. It is difficult to compare our results against any previous research as this is the first time that SIAscope images have been used for pattern recognition. They also cannot be compared directly due to the different databases and types of image used. However, we can compare accuracy and other performance indicators against our results. In (Sadeghi et al., 2013), the best result of pattern recognition is 89.3%. If we take our results where 70% of the database was used for training, we see that only for brain pattern, size and collagen displacement did we achieve worse accuracy. For all other patterns, we achieved better accuracy. Compared to the sensitivity an specificity of doctors’ (Menzies et al., 2005), our research indicate that it can be helpful for GPs’ to increase their sensitivity and specificity. Presented results indicate that fractal analysis can also increase the accuracy of experts (Menzies et al., 2005). Compared to (?) we can come to a conclusion that proposed in our paper pattern analysis can increase the accuracy of smartphone applications used for melanoma diagnosis. In (Celebi & Zornberg, 2014) where the color feature is recognized, the authors achieved sensitivity and specificity of 62% and 76%, respectively. For the same ratio of training data as in the previous example, we achieved higher sensitivity and specificity for each considered pattern. The second conclusion that can be drawn is that for melanoma patterns, some binarization methods perform better than others. We tested 37 binarization methods and found several methods that give good results in analysis of skin images. Most researches choose the Otsu (Otsu, 1979) method (Abuzaghleh et al., 2015; Premaladha & Ravichandran, 2016); however, as we have shown in our paper, this is not always the best method. The third conclusion is that fractal methods for most patterns we analyzed gave a good or very good result. In some rare

66.1% 7.8% 6.4% 6.5% 0.0% 0.0%

10

7.7% 9.1% 56.5% 9.6% 10.0% 10.0%

Ratio 70/30

score 0 score 1 score 2 score 3 score 4 score 5

74.0% 12.0% 5.0% 4.2% 0.0% 2.5%

15.5% 75.3% 26.8% 24.2% 40.0% 32.5%

5.5% 8.9% 60.6% 12.6% 16.2% 10.0%

Ratio 80/20

score 0 score 1 score 2 score 3 score 4 score 5

75.4% 6.9% 7.6% 6.1% 4.0% 0.0%

15.4% 77.9% 21.9% 23.1% 24.0% 25.0%

7.7% 9.7% 60.5% 8.5% 12.0% 0.0%

cases, our method does not perform well: for example, for detection of brain patterns. In such cases, other methods need to be found. 6. Further works In this research, only two classifiers were used. We intend to test our approach on the available database using other classification methods. As has been proven in this research, fractal methods can be useful in increasing doctors’ melanoma diagnosis accuracy. To prove this, we would need to combine our method with doctors’ diagnostic processes. The next step would be to combine the use of fractal features with standard melanoma diagnostic methods such as ABCD, Menzies, CASH and so on. Deep learning methods could also be used if we are able to increase the number of lesions and compare our results with (Premaladha & Ravichandran, 2016).

ACCEPTED MANUSCRIPT

Acknowledgment The research has been supported by Grant N N518 506439 from The National Science Center of the Republic of Poland. References

AC

CE

PT

ED

M

AN US

CR IP T

Abbadi, N. K. E., & Miry, A. H. (2014). Automatic segmentation of skin lesion using histogram thresholding. Journal of Computer Science, 10, 632–639. Abuzaghleh, O., Barkana, B. D., & Faezipour, M. (2015). Noninvasive realtime automated skin lesion analysis system for melanoma early detection and prevention. Journal of Computer Science, 3, 632–639. Afifi, S., GholamHosseini, G., & Sinha, R. (2016). Hardware acceleration of svm-based classifier for melanoma images. Lecture Notes in Computer Science, 9555, 235–245. Argenziano, G., Catrical`a, C., Ardigo, M., & et. al. (2011). Seven-point checklist of dermoscopy revisited. The British Journal of Dermatology, 4, 785–90. Celebi, M. E., Kingravi, H. A., Uddin, B. et al. (2008). Border detection in dermoscopy images using statistical region merging. Skin Research and Technology, 14, 347–353. Celebi, M. E., & Zornberg, A. (2014). Automated quantification of clinically significant colors in dermoscopy images and its application to skin lesion classification. IEEE Systems Journal, 8, 980–984. Chen, S., Keller, J., & Crownover, R. (1993). On the calculation of fractal features from images. Neural Computation, 5, 305–316. Chen, Y., & Lin, C. (2006). Combining svms with various feature selection strategies. Studies in Fuzziness and Soft Computing, 207, 315–324. Cottton, S., Morse, R., & Chellingworth, M. (2009). Method and apparatus for measuring collagen thickness. US2007/0161910 A1. D’Alessandro, B., & Dhawan, A. P. (2012). 3-d volume reconstruction of skin lesions for melanin and blood volume estimation and lesion severity analysis. IEEE Transaction on Medical Imaging, 31, 2083–2092. Dong, P. (2000). Test of a new lacunarity estimation method for image texture analysis. International Journal of Remote Sensing, 21, 3369–3373. Dubovitskiy, D., Devyatkov, V., & Richer, G. (2014). The application of mobile devices for the recognition of malignant melanoma. In Proceedings of the International Conference on Biomedical Electronics and Devices, (pp. 140– 146). Ercal, F., Chawla, A., Stoecker, W. et al. (1994). Neural network diagnosis of malignant melanoma from color images. IEEE Transactions on Biomedical Engineering, 41, 837–845. Gilmore, S., Hofmann-Wellenhof, R., Muir, J., & Soyer, H. P. (2009). Lacunarity analysis: A promising method for the automated assessment of melanocytic naevi and melanoma. PLoS ONE, 4, 1–10. Glasbey, C. (1993). An analysis of histogram-based thresholding algorithms. Graphical Models and Image Processing, 55, 532–537. Hacioglu, S., Saricaoglu, H., Baskan, E. B., & et al. (2013). The value of spectrophotometric intracutaneous analysis in the noninvasive diagnosis of nonmelanoma skin cancers. Clinical and Experimental Dermatology, 38, 464–469. Heine, H. A., Scholsser, H., Schade, D. et al. (2004). Dermatoscope. US2004/0062056 A1. Henning, J., Dusza, S., Wang, S., & et al. (2007). The cash (color, architecture, symmetry, and homogeneity) algorithm for dermoscopy. Archives of Dermatology, 56, 45–52. Huang, L., & Jang, M. (1995). Image thresholding by minimizing the measure of fuzziness. Pattern Recognition, 28, 41–51. Hunter, J., Moncrieff, M., Hall, P., & et al. (2006). The diagnostic characteristics of siascopy versus dermoscopy for pigmented skin lesions presenting in primary care. British Association of Dermatology, . Kapur, J., Sahoo, P., & Wong, A. (1985). A new method for gray-level picture thresholding using the entropy of the histogram. Computer Vision, Graphics, and Image Processing, 29, 273–285. Kittler, H., Riedl, E., Rosendahl, C., & Cameron, A. (2008). Dermatoscopy of unpigmented lesions od the skin: a new classification of vessel morphology based on pattern analysis. Dermapathology: Practical and Conceptual, . Kockara, S., Mete, M., & Halic, T. (2015). Fractals for malignancy detection in dermoscopy images. International Conference on Healthcare Informatics, (pp. 115–121).

Korotkov, K., & Garcia, R. (2012). Computerized analysis of pigmented skin lesions: A review. Artificial Intelligence in Medicine, 56, 69–90. Kruk, M., Swiderski, B., Osowski, S., & et al. (2015). Melanoma recognition using extended set of descriptors and classifiers. EURASIP Journal on Image and Video Processing, 43. Kuncheva, L. (2014). Combining pattern classifiers. Methods and algorithms. Wiley. Lee, T. K., & Claridge, E. (2005). Predictive power of irregular border shapes for malignant melanomas. Skin Research and Technology, 11, 1–8. Maier, T., Kulichova, D., Schotten, K., & et al. (2015). Accuracy of a smartphone application using fractal image analysis of pigmented moles compared to clinical diagnosis and histological result. Journal of the European Academy of Dermatology and Venereology, 29, 663––667. Manousaki, A. G., Manios, A. G., Tsompanaki, E. I., & Tosca, A. D. (2006). Use of color texture in determining the nature of melanocytic skin lesions – a qualitative and quantitative approach. Computers in biology and medicine, 36, 419–427. Menzies, S. W., Bischof, L., Talbo, H., & et al. (2005). The performance of solarscan. an automated dermoscopy image analysis instrument for the diagnosis of primary melanoma. Archives of Dermatology, 141, 1388–1396. Menzies, S. W., Ingvar, C., Crotty, K. A., & McCarthy, W. H. (1996). Frequency and morphologic characteristics of invasive melanomas lacking specific surface microscopic features. Arch Dermatol, 132, 1178–1182. Meyer, R. (1998). Device for close-up photography of surfaces. US5825502 A. Moncrieff, M., Cotton, S., Claridge, E., & Hall, P. (2002). Spectrophotometric intracutaneous analysis: a new technique for imaging pigmented skin lesions. British Journal of Dermatology, 146, 448–457. Mullani, N. A. (2004). Dermoscopy epiluminescence device employing cross and parallel polarization. US2004/0174525 A1. Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9, 62–66. Park, J., & Sandberg, I. (1993). Pattern classification using neural networks. Neural Computation, 5, 305–316. Pires, V. B., & Barcelos, C. A. Z. (2007). Edge detection of skin lesions using anisotropic diffusion. Seventh International Conference on Intelligent Systems Design and Applications, ISDA’07, 363–370. Premaladha, J., & Ravichandran, K. S. (2016). Novel approaches for diagnosing melanoma skin lesions through supervised and deep learning algorithms. Journal of Medical Systems, 40. Sadeghi, M., Lee, T. K., McLean, D., & et al. (2013). Detection and analysis of irregular streaks in dermoscopic images of skin lesions. IEEE Transaction on Medical Imaging, 32, 849–861. Sezgin, M., & Sankur, B. (2004). Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic Imaging, 13, 146–165. Shanbhag, A. (1994). Utilization of information measure as a means of image thresholding. Graphical Models and Image Processing, 56, 414–419. Smith, B. T. (). Lagrange multipliers tutorial in the context of support vector machines. http://www.engr.mun.ca/~baxter/Publications/ LagrangeForSVMs.pdf. Soyer, H. P., Argenziano, G., Hofmann-Wellenhof, R., & Johr, R. H. (2010). Color Atlas of Melanocytic Lesions of the Skin. Springer. Soyer, P., Argenziano, G., & Zalaudek, I. (2004). Three-point checklist of dermoscopy. Dermatology, 208, 27–31. Stolz, W., Riemann, A., Cognetta, A. B., & et al. (1994). Abcd rule of dermatoscopy: a new practical method for early recognition of malignant melanoma. Journal of Dermatology, 4, 521–527. S´aez, A., Sanchez-Monedero, J., Gutierrez, P. A., & Hervas-Martinez, C. (2015). Machine learning methods for binary and multiclass classification of melanoma thickness from dermoscopic images. IEEE Transaction on Medical Imaging, 34, 1–10. S´aez, A., Serrano, C., & Acha, B. (2014). Model-based classification methods of global patterns in dermoscopic images. IEEE Transaction on Medical Imaging, 33, 1137–1147. Tomatis, S., Carrara, M., Bono, A., & et. al. (2005). Automated melanoma detection with a novel multispectral imaging system: results of a prospective study. Physics in medicine and biology, 50, 1675–1687. Wu, X., Kumar, V., Quinlan, J. et al. (2008). Top 10 algorithms in data mining. Knowledge and Infrmation Systems, 14, 1–37.

11