Applied factor analysis in the natural sciences

Applied factor analysis in the natural sciences

Computers & Geosciences Vol. 20, No. 4, pp. 665 666, 1994 Elsevier Science Ltd. Printed in Great Britain Pergamon REVIEW Applied Factor Analysis in...

160KB Sizes 1 Downloads 53 Views

Computers & Geosciences Vol. 20, No. 4, pp. 665 666, 1994 Elsevier Science Ltd. Printed in Great Britain

Pergamon

REVIEW

Applied Factor Analysis in The Natural Sciences by Richard Reyment and K. G. J6reskog, Cambridge University Press, 1993, 371 p. $79.95(US) This book is the second edition of the book, Geological Factor Analysis, that was written originally by J6reskog, Klovan, and Reyment (1976) and reviewed by Butler (1977). The preface to the 1993 edition states that the book has been rewritten because of numerous requests and the new title reflects the awareness of a wider audience that use factor analysis methods in the natural sciences. As in the first edition, there are many geological examples. Some new examples have been included which represent data from the environmental and biological sciences. Chapter I introduces a simple example of a multivariate data set and the application of principal component factor analysis to assess variability in the data. A new addition to this chapter is a list of examples within the natural sciences that make use of factor analysis methods. Chapter 2, Basic mathematical and statistical concepts, is basically unchanged from the first edition. The chapter introduces matrix algebra and notation, descriptive statistics, correlation, covariance, regression, and coordinate system rotation. Eigenvalues/ eigenvectors are introduced along with some useful properties of matrices which are relevant to factor analysis problems. One of the fundamental theorems that relate Q-mode and R-mode factor analysis, the Young Eckhart theorem, is documented in detail. This chapter is an excellent summary of the basic matrix algebra that is used in factor analysis methods. Chapter 3, Aims, ideas, and models of factor analysis, also is essentially unchanged from the first edition. The chapter discusses basic ideas and concepts associated with R-mode analysis. Two methods, principal components analysis and "'true" factor analysis are discussed. The authors make a distinction between the descriptive use and the inferential use of factor analysis. Fixed case studies involve the analysis of one sample population without reference to it being derived from a larger collection of objects. The random case refers to a random sample that is assumed to be part of some specified population. The random case allows for the inference of the population based on the sample that has been analyzed. Principal components and true factor analyses then are discussed in terms of mathematical models. Chapter 4, R-mode methods, describes the methods used to analyze the relationships between variables.

Principal components analysis and true factor analysis are discussed in the context of the fixed and random cases. New additions to this chapter include: robust principal components analysis; the use of M-estimators for deriving robust covariancc matrices: cross-validation as a method of testing the effectiveness of principal components analysis; PCA applied to compositional data; path analysis; and Wrightian factor analysis. Cross-validation addresses questions such as: how many principal components can be usefully retained; can any variables be excluded on the grounds of redundancy; and which observations deviate in some multivariate manner from the main body of the data? Chapter 5, Q-mode methods, has been rewritten with several new references to Q-mode analysis. The chapter includes a discussion on log-contrast PCA for compositional data and its extension into principal coordinates analysis. A new topic, the analysis of asymmetry, is concerned with asymmetric distributions. Asymmetric distributions result when the relationship between two specimens are not equal, that is a,1 ¢ aij. Chapter 6, Q R-mode methods, is a new chapter and represents a reorganization of ideas about Q R methods that have developed since the publication of the first edition. Correspondence analysis is compared with principal coordinates and Q-mode factor analyses and the relative merits of each are discussed. There have been enhancements on the section of the Gabriel biplot method which was discussed only briefly in the first edition, and a new section on the use of canonical correlation analysis. Chapter 7, Steps in the analysis, provides an outline of steps required to carry out factor analysis. The steps are divided into: objectives; categories of data; the data matrix; selection of the measure of association; choice of the factor method; selection of the number of factors; rotation of the factors; and factor scores. The section on the data matrix includes discussions on the selection of variables, mixed populations of data; dimensions of the data matrix; data scales (nominal, ordinal, interval, or ratio), and data transformations. Another important subject is the selection of the appropriate measure of association for variables/objects. Measures of association include the use of the covariance or correlation matrix for R-mode methods. Other similarity coefficients that are used mainly in Q-mode methods include the major product moment, cosine association measure, the Euclidean distance measure and Gower's similarity

665

666

Review

measure. The section on factor rotation is extensive and covers the varimax, oblique, and promax methods with examples of each. Chapter 8, Examples and case histories contains some new additional examples including R-mode factor analysis applications in paleoecology and a Q R-mode analysis of crude oils. As well, a relatively new field of multivariate analysis of shapes, the method of principal warps is introduced. An example is given using measurements taken from ostracod shells. The book also contains an Appendix containing computer programs for multivariate analysis written by Leslie Marcus of Queens College, New York. The Appendix contains listings of programs written for the software package, M A T L A B . The programs also are available on diskette from the author, or by anonymous ftp on the lnternet as described in the text. Tile program code in the text is easy to read and can be translated easily into any other high level statistical/mathematical analysis language such as SAS, SPSS, or Splus. There are several programs written for most of the topics covered in each chapter of the book and a list of programs that are supplied in the Appendix. This Appendix is a valuable addition to the book. Although the programs were not tested, because of the simplicity of the M A T L A B syntax and structure, they can be checked easily against the mathematics in the text. Accessing the files via anonymous ftp proved to be more difficult than it should have been. Unfortunately, the files that contain the programs are not identified easily amongst a myriad of other files and programs that exist in the ftp download directory. The files are self-extracting archived files and must be downloaded as binary files. Eventually, after downloading and reading a file named

R E A D M E . S O F T W A R E , the correct files were located and downloaded, however a separate directory should have been established for the files associated with this book. The second edition has been reworked to a considerable degree and contains many new references that describe advances in multivariate methods and developments since the publication of the first edition. With the exception of the new additions to the book, the figures and tables are unchanged from the first edition. A criticism of the first edition was the lack of an index. An index has been included in the second edition. The figures are laid out clearly and captioned. The numerous tables in the text also are labeled clearly and referenced. As stated by the authors in the Preface, the book is not exhaustive in its list of up-to-date references. This is unfortunate as there are several important references missing on developments in R-mode, Q-mode, and R - Q - m o d e methods that have appeared in recent years. Despite this weakness, the second edition of this book is a valuable resource.

Geological Survey Branch Ministry of Energy Mines & Petroleum Resources 1810 Blanshard Street Victoria, B.C. Canada V8V IX4

ERIC: GRUNSKY

REFERENCES

Butler, J. C., 1977, Book Review, "Geological factor analysis': Jour. Math. Geology, v. 9, no. 6, p. 653 754. J6reskog, K. G., Klovan, J. E., and Reyment, R. A., 1976~ Geological factor analysis, Methods in Mathematics l: Elsevier Scientific Publ. Co., Amsterdam, 178 p.