Image datasets of cocoa beans for taxonomy nuances evaluation

Image datasets of cocoa beans for taxonomy nuances evaluation

Journal Pre-proof Image Datasets of Cocoa Beans for Taxonomy Nuances Evaluation F.A. Santos, E.S. Palmeira, G.J. Jesus PII: S2352-3409(19)31010-8 DO...

900KB Sizes 0 Downloads 88 Views

Journal Pre-proof Image Datasets of Cocoa Beans for Taxonomy Nuances Evaluation F.A. Santos, E.S. Palmeira, G.J. Jesus PII:

S2352-3409(19)31010-8

DOI:

https://doi.org/10.1016/j.dib.2019.104655

Reference:

DIB 104655

To appear in:

Data in Brief

Received Date: 22 July 2019 Revised Date:

29 September 2019

Accepted Date: 7 October 2019

Please cite this article as: F.A. Santos, E.S. Palmeira, G.J. Jesus, Image Datasets of Cocoa Beans for Taxonomy Nuances Evaluation, Data in Brief, https://doi.org/10.1016/j.dib.2019.104655. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 The Author(s). Published by Elsevier Inc.

*Title:

Image Datasets of Cocoa Beans for Taxonomy Nuances Evaluation

*Authors:

Santos, F. A., Palmeira, E. S. and Jesus, G.Q.

*Affiliations:

Universidade Estadual de Santa Cruz

*Contact email:

[email protected], [email protected] and [email protected].

*Co-authors: *CATEGORY:

Agronomy, Crop Science

Title: Image Datasets of Cocoa Beans for Taxonomy Nuances Evaluation Authors: Santos, F. A., Palmeira, E. S. and Jesus, G. J. Affiliations: Universidade Estadual de Santa Cruz Contact email: [email protected], [email protected] and [email protected]

Abstract There are some classification methods that generate nuances in the final accuracy caused by objects positioning, framing and damage. Those occurrences may result in a drop of accuracy in computer vision systems that were trained with structured static datasets and are intended to use in day-to-day applications in which the images are not always as organized as the trained dataset, like some biometric classification systems such as iris and fingerprint. In this regard, this paper presents six image datasets processed with different methods to help researchers analyze the impact of object positioning, framing and damage in their taxonomies. Keywords: Cocoa beans. Cut Test. Object positioning. Taxonomy evaluation. Specifications Table Subject area More specific subject area Type of data How data was acquired

Agronomy, Image Processing Crop Science Images (JPG) Different image processing methods, implemented with R language and MATLAB, were applied to an existing dataset of cuttest-classified cocoa beans.

Data format Experimental factors

Experimental features

Data source location Data accessibility Related research article

Raw (preprocessed .JPG image files.) The methods were selected with the intent of generating datasets to facilitate the use and comparison of different classification methods and of the impact of the positioning of objects in images. The source dataset (presented in [1]) was submitted to multiple research groups in Brazil and some tests (such as with fuzzy associative memories) returned different accuracies for the same objects as their position were algorithmically manipulated [2,3]. Ilhéus, Bahia, Brazil. Data is available

at Mendeley Data, under the doi: (also in reference [4]). Felipe A. Santos (2019). “Modelagem de Um Sistema de Visão Computacional para a Classificação de Amêndoas de Cacau na Prova de Corte (Master's thesis)”. State University of Santa Cruz, Ilhéus, Brazil.

Value of the Data • These data can be used to test the impact of the objects positioning in the computational classification methods; • Those datasets can facilitate the comparison between the methods accuracies of different research groups; • The data will allow the creation, training and improvement of computational vision models to help specialist in the Cut Test classification; • The data can be used to evaluate the impact of different damage levels caused to images during their processing regarding their classification methods results.

1. Data Six image datasets of cut-test-classified cocoa beans were created from the research presented in [3]. Each dataset contains 14 classes (namely: Compartmentalized Brown, Compartmentalized White, Compartmentalized Partially Purple, Compartmentalized Purple, Compartmentalized Slatty, Plated Brown, Plated White, Plated Partially Purple, Plated Purple, Plated Slatty, Moldered, Flattened, Brittle and Agglutinated) with 100 images per class, totaling 1400 images per dataset. Fig. 1 presents an image of the source dataset and Fig. 2 presents the six results of the same beans accordingly with the preprocessing methods.

Fig. 1: A Compartmentalized Brown bean from the source dataset

Fig. 2: The six versions created from the bean from Figure 1

2. Materials and Methods This section brings the explanation about the processing applied to the source image to produce the resulting datasets. Each section of this chapter will present the methods used to reach each of the six preprocessed versions. Those methods were obtained through empirical tests with a fuzzy associative memories implementation sensible to objects positioning, those six datasets versions presented six different accuracies for the same classification method, with a standard deviation of approximately 10.07% between them.

2.1. A-Method: Background Removed Presented under the name of “background_removed_-_version_1_-_method_a.rar” in the repository, five steps (see Fig. 3) were applied to generate the images of this dataset, being: i. ii. iii. iv.

The CIELAB color space were used to remove the background; The image was binarized, everything that was not pure RGB black were turned into pure RGB white; All connect white pixels were labeled as regions and only the largest region of the step (ii) were preserved; The (iii) result were applied as a mask in the source image.

Fig. 3: (a) Source; (b) Background Removed; (c) Binarized; (d) Preservation of Largest Body; (e) Application of Mask in Source Image.

2.2. A-Method: Only Image To create this dataset all images resulted from the “A-Method: Background Removed” were cropped to the minimum rectangle capable of fitting each bean, resulting in images with varying widths and heights, as shown in Fig. 4. This dataset is under the name “framed_and_centralized_-_version_1_-_method_a.rar” in the repository.

Fig. 4: (a) and (b) are images from the “A-Method: Background Removed” and (c) and (d) are their respective cropped images to the minimum fitting rectangle. 2.3. A-Method: Framed and Centralized All beans from “A-Method: Background Removed” were measured and then centralized in the smallest rectangle capable of fitting all beans, thus all images have the same dimensions: 3011x2851. Two samples of this process can be seen in Fig. 5. This dataset is under the name “framed_and_centralized_-_version_2_-_method_a.rar” in the repository.

Fig. 5: (a) and (b) are images from “A-Method: Background Removed” and (c) and (d) are their respective centralized beans in the created frame.

2.4. B-Method: Background Removed This dataset (named “background_removed_-_version_2_-_method_b.rar” in the repository) was created with additional steps to restore the damage caused to some beans during the background removal, as shown in Fig. 6, being: i. ii. iii. iv.

The CIELAB color space were used to remove the background; The image was binarized, everything that was not pure RGB black were turned into pure RGB white; All connect white pixels were labeled as regions and only the largest region of the step (ii) were preserved; The image was inverted;

v. vi. vii.

All connect white pixels were labeled as regions and only the largest region of the step (iv) were preserved; The image was inverted again; The (vi) result were applied as a mask in the source image.

Fig. 6: (a) Source; (b) Background Removed; (c) Binarized; (d) Preservation of Largest Body; (e) Inverted; (f) Preservation of Largest Body; (g) Inverted; (h) Application of Mask in Source Image.

Important to perceive that the process to restore damages to the beans also restore parts of the backgrounds that were contained inside them, as shown in Fig. 7, caused by hollow areas (such as broken ones) in the beans.

Fig. 7: Sample of restored background

2.5. B-Method: Only Beam The same process to create “A-Method: Only Image” were applied to the “B-Method: Background Removed” to create this dataset (see Fig. 8). This dataset is under the name “framed_and_centralized_-_version_3_-_method_b.rar” in the repository.

Fig. 8: (a) and (b) are images from the “B-Method: Background Removed” and (c) and (d) are their respective cropped images to the minimum fitting rectangle

2.1. B-Method: Framed and Centralized The same process to create “A-Method: Framed and Centralized” were applied to the “BMethod: Background Removed” images to create this dataset. The rectangle obtained were of the same dimensions as the one from “A-Method: framed and centralized” (3011x2851). Two samples of this process can be seen in Fig. 9. This dataset in under the name “framed_and_centralized_-_version_4_-_method_b.rar” in the repository.

Fig. 9: (a) and (b) are images from “B-Method: Background Removed” and (c) and (d) are their respective centralized beans in the created frame

Acknowledgments This research was made in partnership with the Programa de Pós-Graduação em Modelagem Computacional em Ciência e Tecnologia (PPGMC) and with the Centro de Inovação do Cacau (CIC). The financial support was provided by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Fundação de Amparo à Pesquisa do Estado da Bahia (FAPESB).

References [1] Felipe A. Santos, Eduardo S. Palmeira and Gildson Q. Jesus. An Image Dataset of Cut-Test-Classified Cocoa Beans. Data in Brief Volume 24, June 2019, Article 103916. [2] Felipe A. Santos, Eduardo S. Palmeira and Gildson Q. Jesus. Color, structural and textural features for the classification of a cocoa beans image dataset using artificial neural network. In: XIV Workshop de Visão Computacional, 2018, Ilhéus. ANAIS WVC 2018, 2018. p. 80-84. [3] Felipe A. Santos (2019). Modelagem de Um Sistema de Visão Computacional para a Classificação de Amêndoas de Cacau na Prova de Corte (Master's thesis). State University of Santa Cruz, Ilhéus, Brazil. [4] Felipe A. Santos (2019), “Image Datasets of Cocoa Beans for Taxonomy Nuances Evaluation”, Mendeley Data, V2, doi: 10.17632/pcx7mj68yn.4 .