CONS an R based graphical interface to perform Consonance Analysis

CONS an R based graphical interface to perform Consonance Analysis

Accepted Manuscript Short Communication CONS an R Based Graphical Interface to Perform Consonance Analysis Victor M. Aguirre, N. Sofia Huerta-Pacheco,...

974KB Sizes 0 Downloads 51 Views

Accepted Manuscript Short Communication CONS an R Based Graphical Interface to Perform Consonance Analysis Victor M. Aguirre, N. Sofia Huerta-Pacheco, M. Teresa Lopez PII: DOI: Reference:

S0950-3293(17)30167-2 http://dx.doi.org/10.1016/j.foodqual.2017.07.008 FQAP 3364

To appear in:

Food Quality and Preference

Received Date: Revised Date: Accepted Date:

23 May 2017 24 July 2017 25 July 2017

Please cite this article as: Aguirre, V.M., Sofia Huerta-Pacheco, N., Teresa Lopez, M., CONS an R Based Graphical Interface to Perform Consonance Analysis, Food Quality and Preference (2017), doi: http://dx.doi.org/10.1016/ j.foodqual.2017.07.008

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

CONS an R Based Graphical Interface to Perform Consonance Analysis  By

Victor M. Aguirre Statistics Department, ITAM Email: [email protected]

N. Sofia Huerta-Pacheco Facultad de Estadística e Informática Universidad Veracruzana Email: [email protected] And M. Teresa Lopez Delphi Intelligence Email: [email protected]

1

Abstract With the advent of widespread computing and availability of open source programs to perform many different programming tasks, nowadays there is a trend in Statistics to program tailor made applications for non statistical customers in various areas. This is an alternative to having a large statistical package with many functions many of which never are used. In this article, we present CONS an R package dedicated to Consonance Analysis. Consonance Analysis is a useful numerical and graphical exploratory approach for evaluating the consistency of the measurements and the panel of people involved in sensory evaluation. It makes use of several uni and multivariate techniques either graphical or analytical, particularly Principal Components Analysis. The package is implemented in a graphical user interface in order to get a user friendly package.

Key words—Sensory evaluation, Sensory components analysis, R computer package

2

panel

performance,

Principal

1. Introduction In this paper we present the CONS package (Huerta, Aguirre, & Lopez, 2017), a package to perform Consonance Analysis, which is a technique oriented to evaluate the performance of a sensory panel. Sensory and consumer research often makes use of a panel of assessors to determine product attributes. In this application we assume that the panel evaluates several products with respect to different attributes. For example in the case of a food product a panel would produce scores on taste, hardness, smell, etc. These attributes depend on each assessor and of course it is not expected that they would be uniquely defined even if the panel is formed purely of trained judges. Several methods have been proposed for evaluating the panel performance such as these proposed by (Arnold, & Williams, 1985; Banfield, & Harries, 1975; Gower 1975; Dijksterhuis, & Gower 1991; Dijksterhuis, & Gower 1992; van der Burg, 1988; van der Burg, & Dijksterhuis, 1989). This package implements the procedure proposed in (Dijksterhuis, 1995; 2004). It is oriented towards the evaluation of consistency of the panel performance and the author named the method "consonance". A critical assumption of the method is that all of the judges use nominally the same product attributes. As mentioned by the author, the method provides compact graphical displays to evaluate the panel performance. The CONS package is an R based graphical interface (R Core Team, 2016), that performs consonance analysis on a data set using only point and click operations. It is based on the gWidgets library (Verzani, 2014 a, b), that provides a toolkit-independent API for building interactive GUIs and the tcltk library that provides access to the platform-independent Tcl scripting language and Tk GUI elements (R Core Team, 2016), and the Principal Component Analysis (PCA) from FactoMineR package (Le, Josse, & Husson, 2008). 2. Data Structure The data consists of a matrix X of order N  M  K . The panel has N products, judged with respect to M attributes by a panel that has K assessors. If there are replications this can be handled by the package replicating the number of products. We denote the elements of this matrix by xijk i  1,..., N ; j  1,..., M ; k  1,..., K . In order to perform the analysis, the three way matrix X is thought to be composed of M matrices X j of order ( N  K ), one for each attribute. Each column of this matrix corresponds to one assessor. If the K assessors interpret or evaluate an attribute similarly the X j matrix would span a one dimensional space. Hence, if PCA of this matrix shows a significant deviation from unidimensionality this would be evidence that the judges are interpreting the attribute differently. And that will show that there is no agreement within the panel. 3. Consonance Analysis

3

Denote by X j the previous matrix but with standardized columns. Then the matrix T

X j X j contains the inter-assessor correlations which measure the degree of agreement between assessors for this attribute. Notice that the columns of X j add to zero since the column mean is subtracted, hence there are K '  K  1 positive eigenvalues in the T spectral decomposition of X j X j . The underlying unidimensional structure is evaluated by the PCA. A large first Eigenvalue is a signal that the attribute is being evaluated in a similar fashion by the panel. Consonance analysis uses the following Eigenvalue decomposition T X j X j  QΛ 2QT , where

Λ2  diag{12 , 22 ,..., 2K '} , contains the Eigenvalues which are in decreasing order and the matrix Q has orthonormal columns. A measure of the size of each Eigenvalue is given by the "Variance Accounted For" ( VAF ) metric VAFk 

2k

r 1 2r K'

.

For k  1,..., K ' . The author also proposes a ratio C that he calls the Consonance C



12 K' r 2

2r

which measures the relative size of the first Eigenvalue with respect to the rest of Eigenvalues. The package CONS reports both VAF and C for each attribute. It also shows on a single plot all VAF’s corresponding to all attributes. A plot of all C’s is also provided. Dijksterhuis (2004) also suggests plotting what he calls the "loadings". Denote M[ 2] to be the first two columns of a matrix. The columns of matrix X j correspond to assessors, T

then the rows of matrix X j correspond to assessors, the loadings are defined by

(QΛ)[ 2] , T

notice that this is a two dimensional approximation of the rows of X j . This plot is also useful in analyzing panel consonance because if an assessor deviates too much from the cloud of the rest of the assessors this would be evidence that this assessor needs further training. It also serves to detect if an attribute has not been understood very well by the panel and for this reason some members of the panel evaluate it in a very different way. 4. Use of the Graphical Interface The CONS package is loaded in the usual fashion from the CRAN-R repository # Install

4

R> install.packages("CONS") #Call library R> library ("CONS") # Once the package is installed type on the console window. R> CONS() We will illustrate the use of CONS with the application of the procedure to an example coming from an R&D department of the chewing gum industry. The sensory data was collected on 20 samples of chewing gum formulations after letting the samples mature over a period of one week. Hence in this case the number of products is 20, however there were 3 sessions, in each session all of the products were evaluated once, hence each session corresponds to one replicate, and CONS considers a total of N = 60 products for the analysis, but the program also takes into account the information of the replicates. Eleven attributes were scored after 1 minute of chewing, and after 5 minutes; on a scale of 0 to 100, hence the number of attributes is M = 22. The sensory evaluation was performed by 9 assessors therefore K = 9. Technically speaking the computations are performed as if the number of products were 60, but the program takes explicitly into account the replicates when plotting the loadings, as suggested in (Dijksterhuis, 2004). The attributes evaluated were: Sticky, Natural Strawberry Flavor, Sweetness, Sourness, Bitterness, Creamy, Total Impact Flavor, Fresh Mouth, salivation, Dry Mouth and Waxy Palate. The data consists of 540 rows compounded by 20 samples times 9 assessors times 3 replicates and 25 columns. Figure 1 shows part of the excel data sheet set. The first column indicates assessor, the second samples or product, and the third corresponds to the session which corresponds to the replicate, and the following 22 columns correspond to the attributes, as mentioned before the first 11 attributes correspond to the evaluation after one minute, and the rest to the evaluation after five minutes.

Figure 1. Partial View, original dataset in *.xlxs format Note that it is very important that the data is in exactly the same order as shown in Figure 1. The first column corresponds to Assessors - it could be names or numbers. The second column corresponds to Products - names or numbers too. The third column corresponds to replicates - names or numbers. There could be more than two replicates as long as they are ordered in the same way for each Assessor-Product combination.

5

CONS assumes that the data is balanced, in the sense that the number of replicates is the same for each Assessor-Product combination. If there are no replicates then this column is not present. Missing data are not allowed. If an attribute has missing observations it should be completely discarded. If a replicate has some missing observations it should also be completely discarded in order to keep the balanced structure of the data

Figure 2. Front face CONS package.

Figure 2 shows the front face of the module. This is a point and click package and hence the user can click on any of the words on top: File, Method, Help, each of which opens a menu of options. To indicate the sequence of clicks needed for a certain operation we are going to use the notation Click1 > Click2 > Click3 > etc. To observe the numerical results of the operations, the user should click on the tabs below the line File, Method, Help. For example, to load the data coming from an excel file you have to click the following sequence: File > Open > xlsx. The package gives also the option to open files in *.csv or *.txt format. To view the data click File > View, the result is shown in the background of Figure 3. To start the analysis the package requires that the parameters of the analysis are defined. This is done by clicking Method > Parameters. Figure 3 shows the window that opens at this stage, you have to define the names of the columns that correspond to Assessor, Product and Replicate. In this example Assessor corresponds to Panelist_Diplay_Name, Product to Sample_Name, and Replicate to Session_Number. In this example the data contains a total of 3 replicates. The package allows unreplicated data, if this is the case choose Null in the Replicate window. The data has to be balanced, which means that for each combination of assessor-product the number of replicates has to be the same and contain no missing observations.

6

Figure 3. Display of data (background of figure) and window for definition of parameters. The program makes an exploratory check on the data using a schematic diagram (Koopmans, 1987) of all of the attributes. This plot is useful to detect a bad value, for example a negative value. Such values may severely affect the Consonance Analysis. This analysis is also useful to detect a suspicious outlying value. To perform this analysis click Method > Descriptive General. Figure 4 gives the schematic diagram for this data set. There are no negative values and there are no suspiciously large observations.

Figure 4. Schematic diagram for the twenty two attributes Once the previous check is done we proceed to perform the Consonance Analysis, to do this click the following sequence: Method> Consonace.analysis > Assessors.times.variables > Consonance which produces the result shown in Figures 5 and 6, these figures presented here are in shades of grey, CONS presents them in colors. Also the lettering system shown in Figure 6 is just for the sake of clarity in this presentation in shades of grey, CONS just uses

7

colors.

Figure 5. Consonance Metric. Twenty two attributes

Figure 6. Variance Accounted For Metric. Twenty two attributes From Figure 5, we observe that attribute 4 (D) is the closest to being unidimensional with the largest value of Consonance, followed by attribute 15 (O). If we identify these attributes, both correspond to sourness. The first is at time one minute (Cons=15.18) and the next is sourness at time five minutes (Cons=3.57). This result tells us that sourness is

8

the attribute with best panel agreement independently of the time it is tasted. The next largest attribute, relative to Consonance, is At 9: Salivation_01 (I) (Cons=1.54). The rest of the attributes show a very low panel agreement and very small values for Consonance (all values less than 1). From Figure 6, we observe that only Sourness_01 (D), Sourness_05 (O), and Salivation_01 (I) have the largest VAF for the first dimension. And also we observe that for the rest of the attributes the first VAF is only around 20% and showing non unidimensionality for the other VAF values, which indicates (almost) random data. The loadings plots are useful to detect if any of the judges is deviating from the rest on a certain attribute, thus making it not unidimensional. CONS plots the loadings of four attributes at a time. To obtain these plots enter the tab labeled Consonance and click the button Loading Plots. Then a window opens and you can choose 4 attributes to plot. This is shown in Figure 7.

Figure 7. Display of Loadings plot selection Figure 8 shows the loadings for Sourness_01, which is the closest to unidimensionality. We can see that except for Judge Lori, the rest of the judges are very close and pointing basically in the same direction. The plot on the right shows the scores from the PCA for the N products, where the same product is joined with straight lines. This plots help evaluate the consistency of the scoring with respect to each attribute separately. For this attribute we can observe that two groups clearly show. It would be interesting to find out the composition of both groups and to find out why this separation happened. There is no evidence of a clear systematic pattern between the three samples of each product.

9

Figure 8. Loadings plot At 4: Sourness_01 Figure 9 shows the loadings for Sticky_01, which is one of the least unidimensional. From here we can see the lack of agreement of the judges because the arrows go practically in all directions, it would be useful to find out what is causing this disagreement. This gives an opportunity to find out why this situation is happening and then to improve the agreement of the panel. The right hand plot of Figure 9 shows a lot of overlap of the products, there seems to be no clear systematic difference.

Figure 9. Loadings plot At 1: Sticky_01 Figure 10 shows the loadings plot for Fresh Mouth_01, which is also an attribute with very low consonance. The spread of directions in the arrows is much larger that the spread shown in Figure 8. The right hand plot of Figure 10 shows a lot of overlap of the products, there seems to be no clear systematic difference. This analysis could be repeated for the rest of the attributes to find groups of judges more similar than others,

10

or to pinpoint if there are any assessors that behave badly on a certain attribute. Or to find out which attributes should be improved in their definition. From the right panel of this figure we see that there is no evidence of a clear systematic pattern between the three samples of each product.

Figure 10. Loadings plot At 8: Fresh Mouth_01 5. Conclusions Consonance analysis gives a graphical perspective which makes it very easy to evaluate the concordance of judgements or evaluations from a panel based on quantitative metrics. We have presented a graphical interface that performs consonance analysis on a three way data set that represents the evaluation of attributes of a panel on a set of products. The interface has the characteristics of being very flexible, general and very easy to use. Acknowledgements The first author gratefully acknowledges financial support from the Asociación Mexicana de Cultura, A.C. The second author acknowledges the financial support from the Mexican Council of Science and Technology (CONACYT, Scholarship 421460). References Arnold, G. M., & Williams, A. A. (1985). The Use of Generalised Procrusters Analysis in Sensory Analysis. In J. R. Piggot (Ed.), Statistical Procedures in Food Research. North Holland, Amsterdam. Banfield, C. F., & Harries, J. M. (1975). A Technique for Comparing Judges' Performance in Sensory Tests. Journal of Food Technology, 10, 1-10. Dijksterhuis, G. B., & Gower, J. C. (1991/92). The Interpretation of Generalised Procrusters Analysis and Allied Methods. Food Quality and Preference, 3, 67-87.

11

Dijksterhuis, G. B. (1995). Assessing Panel Consonance. Food Quality and Preference, 6 (1), 7-14. Dijksterhuis, G. B. (2004). Multivariate data Analysis in Sensory and Consumer Science. Food and Nutrition Press, Inc. USA, Trumbull Connecticut, (Chapter 2). Gower, J. C. (1975). Generalised Procrusters Analysis. Psychometrika, 40, 33-51. Huerta, N. S., Aguirre, V. M., & Lopez, M. T. (2017). CONS: Consonance Analysis with R. R package version 0.1.1, URL https://CRAN.Rproject.org/package=CONS/ Accessed 18.07.17 Koopmans, H. K. (1987). Introduction to Contemporary Statistical Methods. (2nd ed), Duxbury Press, Boston, (Chapter 3). Le, S., Josse, J., & Husson, F. (2008). FactoMineR: An R Package for Multivariate Analysis. Journal of Statistical Software, 25 (1), 1-18. DOI: 10.18637/jss.v025.i01. R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. R version 3.3.2 (2016-1031), URL https://www.R-project.org/ Accessed 18.07.17 van der Burg, E. (1988). Nonlinear Canonical Correlation and Some Related Techniques. DSWO press, Leiden. van der Burg, E., & Dijksterhuis, G. B. (1989). Nonlinear Canonical Correlation Analysis of Multiway Data. In R. Coppi, & S. Bolasco (Eds.) Multiway Data Analysis, 245-255. North-Holland, Elsevier Science Publishers B.V. Verzani, J. (2014a). gWidgets: gWidgets API for building toolkit-independent,interactive GUIs. Based on the iwidgets code of Simon Urbanek, suggestions by Simon Urbanek, Philippe Grosjean and Michael Lawrence. R package version 0.0-54. URL https://CRAN.R-project.org/package=gWidgets Accessed 16.11.16. Verzani, J. (2014b). gWidgetstcltk: Toolkit implementation of gWidgets for tcltk package. R package version 0.0-55. URL https://CRAN.Rproject.org/package=gWidgetstcltk/ Accessed 16.11.16.

12

Highlights  An easy to use graphical interface oriented to perform Consonance Analysis  Consonance Analysis: a multivariate method to evaluate a sensory panel performance  The interface makes use of R, an open source statistical package  The package produces plots of the Consonance and Variance Accounted For metrics  The package also produces loading plots

13