Comparing the results of nonmetric multidimensional scaling and principal components analysis

Comparing the results of nonmetric multidimensional scaling and principal components analysis

J BUSN RES 1988:17:5-14 5 Comparing the Results of Nonmetric Multidimensional Scaling and Principal Components Analysis Joseph L. Balloun A. Ben Oum...

723KB Sizes 4 Downloads 87 Views

J BUSN RES 1988:17:5-14

5

Comparing the Results of Nonmetric Multidimensional Scaling and Principal Components Analysis Joseph L. Balloun A. Ben Oumlil University of Dayton

The purposes of this study are 1) to propose methods for assessing the effectiveness of different dimensional analysis methods, and 2) to examine evidence in one sample for the relative effectiveness of principal components analysis and nonmetric multidimensional scaling of marketing survey data. Criteria for the quality of dimensional solutions are defined from a measure of simple structure. The criteria are demonstrated in one marketing research panel data set. In this data set, neither the type of proximity measure (Pearson or gamma correlation coefficients) nor the dimensional analysis technique (principal components or nonmetric multidimensional scaling) significantly affected any of the criteria of solution quality. Introduction All dimensional analysis methods describe the structure of objects or variables. They start with proximity measurements among pairs of objects, pairs of variables, or object-variable pairs, and represent structure by spatial coordinates. They differ in their choice of proximities, data acquisition methods, and in the relationships assumed between the proximities and the latent spatial coordinates (Davison, 1985). Nonmetric multidimensional scaling (NMDS) and factor analysis (FA) are dimensional analyses. They have wide applicability to the study of marketing. Comparisons of FA and NMDS have been few and muddled because the two different methods often involve comparisons, not only of different algorithms, but also of different types of proximity measures and different data acquisition methods (Davison, 1983). There have been theoretical discussions and some direct comparisons of FA and NMDS. Schiffman et al. (1981) argued for the experimental, mathematical, and interpretative advantages of NMDS over FA. Shepard (1972), Schlessinger and

Address correspondence to: Joseph Balloun, College Park Avenue, Dayton, OH 45469-0001. Journal of Business Research 17,514 (1988) 0 1988 Elsevier Science Publishing Co., Inc. 1988 52 Vanderbilt Ave., New York, NY 10017

Department

of Marketing,

University

of Dayton,

300

0148-2%3/88/$3.50

6

:2s?EE : :

J. L. Balloun and A. B. Oumlil

Guttman (1969), and Lingoes (1971) concluded that NMDS yields more parsimonious solutions than does FA. Mac~llum (1974) argued that the FA model is richer than NMDS because it describes individual differences among objects, and NMDS dues not. MacCallum’s observation applies to some but not all NMDS algorithms. Objectives

of the Study

The purposes of this study are 1) to propose methods for assessing the relative effectiveness of different dimensional analyses, and 2) to examine evidence for the relative effectiveness of principal components analysis and nonmetric multidimensional scaling in a market panel survey. The Nature

of Dimensional

Analysis

Methods

To reduce verbiage, we use exploratory factor analysis language to refer to similar ideas in all dimensional analysis techniques. Thus, in the rest of this paper, “communality” means the sum of the squared projections of a variable on the latent dimensions; a “factor” refers to a latent dimension; a “loading” refers to the projection of a variable on a latent dimension; “factor size” refers to the sum of the squared projections on a single latent dimension. “Proximity” refers to any measure used to assess the “distance” or “similarity” of two variables (Kruskal and Wish, 1978). There are other dimensional analysis techniques that are related less closely to NMDS or FA. They yield descriptions of variables or objects at the nominal or ordinal levels of measurement. These techniques are not within the scope of this paper. Examples include latent class analysis (Anderson, 1954), latent structure analysis (Anderson, 1954), cluster analysis of variables (Tryon and Bailey, 1970), and nonmetric multidimensional unfolding analysis (Coombs, 1964).

Choice of Proximities Among Variables Dimensional analysis methods used to describe variables usually start with numerical proximities between each pair of variables. Proximities include Pearson correlations, rank order correlations, probabilities, judgments of perceived similarity or difference, or rank orderings of variables by perceptual continua. Researchers tend to use specific proximity measures with each dimensional analysis method. Pearson correlations are most frequent in applications of FA, and ratings of perceived similarity often are input to NMDS algorithms. It is possible to analyze a variety of proximity measures with different dimensional analysis methods. Whether the choice of the proximity measure makes a difference in the dimensional solutions is an empirical problem (Kruskal and Wish, 1978).

Data Acquisition Procedures Specific methods of generating data tend to be used with each dimensional analysis method. In FA, the data tend to be attributes of objects (e.g., people’s responses to survey questions). NMDS often starts from judgments of similarity or dissimilarity of pairs of objects or variables.

Nonmetric Scaling and Principal Components

Relationship

J BUSN RES 1988:17:5-14

Between Proximities and Factor Loadings

Factor Analysis Methods FA methods assume that scores of objects on variables are measured at the interval level. The FA algorithms regard the correlation as the scalar product of the vectors representing two variables in space. FA algorithms decompose the matrix of scalar products to yield a spatial representation of relationships among the variables. Consider a fictional example of three objects (or variables) in a one factor space. Let the loadings of objects A, B, and C be respectively 1, 2, and 5 on the factor. The distances among the points can be calculated. In FA, the inter-point distances observed are transformed to scalar products, or the scalar products (e.g., correlation coefficients) are calculated from raw data. The scalar products can be computed from the interpoint distances by procedures given in Torgerson (1958). FA methods compute the projections of the points on the factors from the scalar products. Points A, B, and C respectively have Thurstone centroid loadings of 1.67, .67, and - 2.33. The calculated loadings are a linear transformation of the original loadings.

Nonmetric Multidimensional

Scaling Methods

NMDS assumes that the proximities represent ordinal information about distances among variables. NMDS methods do not assume intervality in the original variables or in the proximities. NMDS methods estimate interval level projections of the data points on the factors from the observed proximities of data points. NMDS operates by estimating an initial factor structure matrix. Then NMDS algorithms gradually modify the factor structure matrix until the rank orders of the observed distances among the data points optimally match the distances calculated from their loadings. Early NMDS algorithms started from a random factor structure matrix for a given number of factors. Coombs (1964) and others start from better initial approximations. From the rank orders of the interpoint distances among points A, B, and C, points A and C are the extremes of the factor. Point B must be closer to A than to C. One adequate NMDS solution would give A, B, and C loadings of 1, 2, and 4. Coombs (1964) and others have argued that when there are many data points relative to the number of factors, the factor loadings estimated by NMDS will approach intervality.

Which Method is Best? The fundamental problem in evaluating different types of dimensional analysis techniques is the lack of generally applicable performance standards. Ideally, the population parameter values would provide standards of comparison. However, we restricted the scope of the study to comparing the relative performance of NMDS and FA in one sample. We define various criteria for the adequacy of a dimensional analysis, and these are shown in Table 1. These may not all be maximized under the best of circumstances, for high performance by one criterion may reduce the performance of an algorithm in another area. Different aspects of performance are shown in the first column of Table 1. Since different dimensional analysis methods are quite different, their results can be compared only under certain conditions, and these are given

8

J BUSN RES 1988:17:5-14

J. L. Bailoun and A. B. Oumlil

Table 1. Conditions for Comparing Results Obtained from Different Dimensional

Analysis Techniques for the Same Data Set ConditionsAllowing Comparability

What is to be Compared? Number of factors

Common criteria for estimating the number of factors

Goodness of fit

Common Index of goodness of lit

Simple structure

A common index of simple structure A common index of relationship between variables’ loadings and their univariate characteristics

Relation to univariate characteristics

Factor loadings

Factor sizes

A common scale of measurement of the factor loadings A common scale of measurement of the factor loadings

Method Used in This Paper The jackknife t-test for increase in the intermax simple structure measure: 0 Pearson correlations of original and reproducted proximities, and t-test for mean difference between them The Intermax criterion, 0 Spearman rank order correlations of variables’ loadings and squared loadings with univariate data counts, means, and standard deviations Scaling and alignment of the factors to the same metric Scaling and alignment of the factors to the same metric

in the second column. The third column of Table 1 shows the specific method used in this study. First, there should be a high positive linear correlation between the original and reproduced proximities, and their means should be highly similar. Second, the sample solution should contain the same number of factors as the population solution. In one sample, different methods should give the same number of factors. Third, the solution should maximize simple structure. Fourth, the loadings of the variables on the factors should be independent of the univariate characteristics of each variable. Fifth, the factor loadings produced by the different dimension analyses should be similar. Sixth, different dimension analyses should concur on the relative sizes of the rotated factors.

Method Description of the Sample Data were gathered via a research instrument mailed to members of the Arkansas Household Research Panel (AHRP). The Consumer Panel consists of over 750 households selected from various geographic units in the state of Arkansas and provides coverage of every city in the state having a population of over 5,000. The mailing of the questionnaire was conducted during the Spring and Summer of 1983. The mailing resulted in 496 usable questionnaires out of 512 mailed out, for a response rate of 97%. The high rate of return occured because of increased cooperation associated with a volunteer consumer panel (McGrown, 1979).

Nonmetric Scaling and Principal Components

J BUSN RES 1988:17:5-14

Data Acquisition Method The research instrument is described in Oumlil (1983). Fourteen statements were developed on consumers’ perceptions and expectations of the economy. The statements are shown in Table 3. These statements were rated on a 6-point Likert scale. The data acquisition procedure was held constant. Proximities Used The Pearson correlation was chosen because of its high frequency of use and because of its well-known restrictive assumptions about the level of measurement of variables and about the nature of relationships among variables. The gamma correlation coefficient (Liebetrau, 1983) was chosen because it makes much weaker assumptions about the nature of the variables and about the nature of their relationship. The gamma correlation coefficient assumes that the variables are measured at the ordinal or higher level, and that their relationship is weakly monotonic. Dimensional Analysis Techniques Principal components analysis was chosen as the “factor analysis” technique because it is often used and because it is the simplest such method. The ALSCAL routine of SPSS-X (Anonymous, 1986) was used for the NMDS solutions. The gamma or Pearson correlation matrices were used as unweighted imput similarities, and the nonmetric multidimensional scaling method according to Kruskal was used to find the NMDS solutions. Transformations of the Dimensional Solutions Most researchers use either some form of factor analysis or nonmetric multidimensional scaling, but not both. Hence, a reasonable comparison of the solutions obtained with different correlation coefficients or different dimensional analysis techniques should not involve rotation of one solution to resemble the other. The solutions may appear to be different for nonsubstantive reasons. The loadings are at the interval level, and, hence, they may differ in means or standard deviations. Rotated solutions may give similar factor structures, but the order and sign of the factors may differ. Therefore, transformations were done to align (match up the factors in appropriate sequence) and rescale (transform matching factors to the same sign, mean and variance) the solutions. The jackknife technique was used to estimate sampling variances of the rotated factor loadings. The odd and even numbered cases by sequence formed two subgroups for the jackknife. See Balloun and Oumlil (1986) for a brief discussion of jackknifing and alignment of rotated factor structures. Data Analysis Methods The study was analyzed as a four-way, completely crossed factorial analysis of variance. The Pearson and gamma correlations are levels of the proximity factor.

10

J BUSN

RES 198&17:5-14

J. L. Balloun and A. B. Oumlil

NMDS and FA are the levels of the dimensional analyses. Variable and factors are the other two independent variables. The raw and squared factor loadings are the two main dependent variables. Several different measures of the quality of the dimensional solutions were created as outlined in Table 1. These measures provide a reasonable set of indices of performance of dimensional analysis techniques. Results

Number

of Dimensions

The Pearson and gamma correlations among the variables are shown in Table 2. Inspection suggests that a two- or three-factor solution may be appropriate. The number of factors was estimated by the intermax jackknife t-test for gain in simple structure (Balloun and Oumlil, 1986). The jackknife t-test was performed separately for each of the combinations of correlations and dimensional analyses. In two of the four conditions, the two-factor solution is most appropriate. The two factor solutions are used in the remaining analyses.

Reproduction of the Proximities Goodness of fit of the reproduced to the original proximities was measured by a) the Pearson correlation between the original and reproduced correlations, and b) the t-test for difference in means of the original and reproduced proximities. The Pearson correlation between the original and reproduced Pearson and gamma correlations when analyzed by principal components is, respectively, 97 and .93. When the Pearson and gamma correlations were analyzed by the NMDS method, the respective correlations of the reproduced and original proximities were .94 and .%.

Simple Structure Intermax measures of the degree of simple structure were computed within each of the four combinations of correlations and dimension analyses. The simple structures were not significantly different from each other by the two-tailed jackknife r-tests with nominal alphas of .05.

Factor Loadings, Communalities, and Factor Sizes The intermax rotated solutions under each of the four combinations of dimensional analysis methods and correlation coefficients are shown in Table 3. There is a high degree of similarity among the loadings under all four combinations of dimensional analysis method and correlation coefficient.

Relation of Multivariate to Univariate Structure Spearman rank order correlations were computed among the variables’ loadings on each of the two factors and their respective data counts, means and standard

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

1

100 29 44 58 60 32 -16 -27 -24 -24 -23 -42 -10 42

2

34 100 12 23 30 03 -03 -20 -22 -24 -12 -23 -13 22

3 50 08 100 44 39 17 02 -12 -11 -09 -11 -27 -11 15

4 60 22 41 100 67 39 -22 -30 -30 -28 -33 -37 -10 32

60 38 34 59 100 35 -21 -30 -35 -31 -31 -43 -11 38

5 28 01 12 31 30 loo -39 -27 -18 -21 -37 -13 15 17

6 -19 -04 -02 -28 -19 -38 100 43 23 28 54 17 -07 -09

7 -27 -19 -12 -32 -27 -18 43 100 40 41 40 36 10 -22

8

9 -26 -24 -12 -27 -33 -10 23 40 100 59 46 40 15 -22

for ExDectation and PerceDtion Questions”

10 -27 -30 -06 -19 -29 - 15 33 33 51 100 58 44 12 -17

11 -26 -15 -06 -28 -26 -26 53 42 45 59 100 40 01 -19

12 -40 -33 -22 -37 -45 -08 11 33 46 50 43 100 28 -28

‘Pearson correlations are below the diagonal, and gamma correlations are above the diagonal. Decimals have been omitted. The fourteen perception and expectation items are as follows: Because of the economic situation during the past few years, as a family, we are very concerned about maintaining our standard of living. During the past few years, our family had to make many adjustments to maintain our standard of living. During the past few years, I had to pay higher prices for products and services. Because of the increased cost of living, our family members have taken occasional employment. Dverall, our family has become worse off economically during the past few years. It is harder to make ends meet. I am pesimistic about obtaining a higher standard of living. I realize that I will be able to improve my economic posltion in the future. I have faith in the economy. As a consumer, I am more happy than I used to be. It will be easier to make financial plans in the tittum. I will probably have more money to spend in the future than I have now. The changed economic condition had no influence on my family’s purchasing abilities. I think that my economic situation willremain the same in the future.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Table 2. Pearson and Gamma Correlations

36 42 08 17 39 12 06 -16 -27 -23 -20 -45 -07 100

14

-16 - 12 -14 -09 - 12 00 06 18 15 a2 32 100 01

13

-08

12

J BUSN RES 198&17:5-14

J. L. Balloun and A. B. Oumlil

Table 3. The Intermax-Rotated Two Dimensional Solutions after Alignment and Sealin@ Dimension Analysis Method FAb

NMDS

Pearson

Gamma

Pearson

Question

I

II

I

II

I

1 2 3 4 5 6 7 8 9 10 11 12 13 14

77* 5.5 59, 65* 75* 14 09 - 19 -32 -23 -11 -60 -30 58

-20 -12 06* -29* -25 -41 77’ 64* 58* 69’ 83 40 01 -II

78* 53 60* 66* 75 19 06 - 20* -32* -23 -11 -60 -30 56

-21 - 16 06 -29 -26 -39 74* 64* 59 70 83 42 06 -13

58 81’ 60 52 58* 30 11 -13 -37* -32* -13* -4O* -49 72

II -29 03 -10 -34 -29 -41 77% 74* 54% 57’ 75* 52* 14 -03

Gamma I

II

58 81 60 52’ 58 30 llf -13 - 37* -32* -13 -4O* -49 72

-29* 03 -10 -34 -29 -411 77’ 74* 54 57 75” 52 14 -03

*Significantly different from zero by the two tailed jackknife r-test with one degree of freedom and alpha of .0.5. “Decimals have been omitted. *Principal Components method. ‘Nonmetric multidimensional scaling.

deviations under the four different combinations of correlations and dimensional analyses for the two-factor solution. There is a significant relationship of variables’ means to their loadings on both factors one and two. The choice of correlation coefficient and the type of dimensional analysis used have little impact on the size of the correlations between means and loadings. Differences in the Simple Structure Achieved by Different Methods The dimensional analysis method, the correlation coefficient, the factor, and the variable were considered independent variables. Certain effects were eliminated by the resealing process. The error of estimation of the loadings was greater for the NMDS method. Variables differed in their average raw loadings (alpha = .05). There was a significant (.05) four-way interaction for the raw loadings. For both the raw and squared loadings, the factor by variable interaction effect was significant (.05). The pattern of results of both analyses show that neither the method nor the correlation coefficient significantly affect the loadings, the squared loadings, nor the degree of simple structure (i.e., the factor by variable interaction). Discussion Differences Among Methods and Correlation CoefJicients The number of factors detected differed when different combinations of correlation coefficients or dimensional analyses were used. Possibly some other general criteria

Nonmetric Scaling and Principal Components

, B”rm KcIJ

1988:17:5-14

13

for the number of factors may reveal greater similarity in the number of factors found for different dimensional analysis methods. The two-dimensional factor structures found do not differ when the dimensional analysis method or the correlation coefficient is different. In this sample, both NMDS and FA yield “the same message” about the configuration of variables. The agreement of different methods when one studies the same phenomena generally indicates validity in many different scientific fields.

Implications

for Future

Research

Research on the comparative effectiveness of different dimensional analysis methods is an important but neglected topic. One of the most important contributions of this paper is the introduction of improved criteria or dependent variables for measuring the effectiveness of dimensional analyses. The intermax criteria for the number of factors can be applied to the results of any dimensional analysis method that yields interval level loadings on factors. This paper introduces the idea of considering the choices of proximity measure, data acquisition methods, or dimensional analysis method as independent variables. The methods proposed in this paper allow the researcher to directly assess the “goodness of fit” of the sample solutions obtained under a variety of conditions to known population parameters. The empirical results of this introductory study are necessarily limited. We compared two different correlations and two different specific types of dimensional analysis techniques in one sample. Within the limitations of the study, the type of correlation and the type of dimensional analysis make no difference in the rotated factor structure. These results are reminiscent of those in other multivariate analyses. For example, in multiple regression, parameter estimates tend to be “overly elegant” for the nature of the data (Wainer, 1976; Wherry, 1975). Future research could broaden the range of a) data acquisition methods, b) proximities, and c) dimensional analysis techniques applied to the data. The accuracy of estimates of various population parameters should be estimated under a variety of conditions. The effectiveness of different dimensional analysis algorithms should be compared over a representative and larger sample of data sets.

References Anderson, Thomas W., On Estimation of Parameters in Latent Structure Analysis, Psychometrika 19 (March 1954): l-11. Balloun, Joseph L., and Oumlil, A. Ben, A Program for Intermax Rotation in Orthogonal Factor Analysis, Behavior Research Methods, Instruments and Computers 18 (June 1986): 331-336. Coombs, Clyde H., A Theory of Data, Wiley, New York, 1964.

Davison, Mark L., Multidimensional Scaling, Wiley, New York, 1983.

14

J BUSN RES

J. L. Balloun and A. B. Oumlil

1988:17:5-14

Davison, Mark L., Intercorrelations,

Multidimensional Scaling Versus Components Analysis Psychological Bulletin, 97 (January 1985): 94-105.

of Test

Kruskal, Joseph B., and Wish, Myron, Multia’imensionalScaling. Sage University paper series on Quantitative Applications in the Social Sciences, 07-011. Sage, Beverly Hills, 1978. Liebetrau,

Albert M., Measures of Association, Sage, Beverly Hills, 1983.

Lingoes, J. C., Some Boundary Conditions For A Monotone Matrices, Psychometrika 36 (June 1971): 195-203.

Analysis

of Symmetric

MacCallum, Robert C., Relations Between Factor Analysis and Multidimensional Psychological Bulletin 81 (August 1974): 505-516. McGrown, M. L., Marketing Research: Text and Cases, Winthrop, 67-71.

Cambridge,

Scaling, 1979, pp.

Oumlil, A. Ben., Economic Change and Consumer Shopping Behavior, Praeger, New York, 1983. Schiffman, Susan S., Reynolds, M. Lance, and Young, FQrrest W., Introduction to Multidimensional Scaling, Academic, New York, 1981. Schlessinger, I. M., and Guttman, Louis, Smallest Space Analysis of Intelligence and Achievement Tests, Psychological Bulletin, 71 (February 1969): 95-100. Shepard, Roper N., Introduction to Volume I, in Multidimensional Scaling: Theory and Applications in the BehavioralSciences, Vol. 1, Roger N. Shepard, A. Kimball Romney, and Sarah B. Nerlove, eds., Seminar, New York, 1972. Torgerson, Warren S., Theory and Methocis of Scaling, Wiley, New York, 1958. Tryon, Robert C., and Bailey, Daniel E., Cluster Analysis, McGraw-Hill, New York, 1970. User’s Guide to SPSS-X, 2nd ed., McGraw-Hill, New York, 1986, pp. 752-775. Wainer, Howard, Estimating Coefficients in Linear Models: It Don’t Make No Nevermind, Psychological Bulletin 83 (March 1976): 213-217. Wherry, Robert J., Sr., Underprediction from Overfitting: 45 Years of Shrinkage, Personnel Psychology 28 (January 1975): 1-18.