Regression Estimators; A Comparative Study

Regression Estimators; A Comparative Study

856 Reviews The biostratigraphic events are to be compiled into a composite zonation for all of the sections which then will be used to correlate th...

202KB Sizes 2 Downloads 137 Views

856

Reviews

The biostratigraphic events are to be compiled into a composite zonation for all of the sections which then will be used to correlate the data. Obviously, the solution would be simple if the events always were present in the same order. Unfortunately, this is definitely not the situation and biostratigraphers are plagued with missing data and inconsistent relations between the taxa which must be resolved. To complicate the exercise further, two types of composite zonations can be computed for the events. Conservative zonations basically are deterministic and attempt to maximize the range zones of the taxa. Such zonations are sensitive to any source of error, for example reworking, that can extend a range zone. Probabilistic zonations are intended to give "average" positions of the events along with estimates of their uncertainty. In my judgement, the RASC (Ranking And SCaling) algorithm of Agterberg is the most feasible, elegant, and comprehensive method now available for determining a probabilistic sequence of biostratigraphic events. The technique is founded on binomial probability where two possibilities are visualized for biostratigraphic events I and J, namely I can occur before or after J. If the two events are coeval in a stratigraphic section, the two possibilities are allocated 50:50 which has the advantage of minimizing the effects of missing data; this probably also produces an effect analogous to "regression toward the mean". The data on the pairwise distribution of the igiostratigraphic events in all stratigraphie sections then serve for compiling the composite ranks of the events in conjunction with a possible error for the location of each event. Several options can be selected for this procedure. Cycles of inconsistent events can be detected and eliminated. The rank order of the events may be adequate for a correlation problem and many schemes stop at this point. Agterberg extended the technique so that a continuous distance scale can be constructed for all events based on normal distribution probability. Some measures of uncertainty are available for the composite sequence and individual stratigraphic sections. The original RASC (Ranking And SCaling) assumes normally distributed events with equal variances. The modified RASC method accommodates unequal variation of the events, and its application to actual and simulated data should generate much empirical, and badly needed, information about the distributions of biostratigraphic events. RASC also can treat marker events, such as bentonites, with zero variances, and unique events such as an index fossil which is only present in one or several stratigraphic sections. In discussing the RASC method, Agterberg presents results from computer simulations used to evaluate its performance and detailed comparisons with other algorithms, such as Unitary Associations, Trinomial Probability, and Seriation. An ultimate goal of the analysis is to provide correlations for the samples, a subject that has gener-

ally been ignored by quantitative biostratigraphers. Agterberg has researched this problem extensively. CASC (Correlation And SCaling in time) is based on interactive bivariate scatter plots of the composite sequence of the biostratigraphic events and the data for each stratigraphic section. If available, the biostratigraphic scale can be calibrated with information such as radiometric dates, bentonites, and magnetic reversals. The correlations can be obtained with smoothing cubic splines. Estimates of error for the events can be calculated, and the optimum smoothing factor for the splines can be determined experimentally. A major advantage of the RASC and CASC algorithms is that the computer programs are in the public domain and easily available. The programs are inexpensive, have been tested extensively, are welldocumented, generally friendly to the user, and contain numerous options at all steps of the analyses. As with any book of this size, there are some minor glitches. Examples are the errors about numbering of matrix rows and columns on p. 146 and several scrambled algebra rules of Rubel on p. 167. Part of Figure 5.4 on p. 164 cannot be read. Luckily, these are few and the book is generally well produced. Overall, Automated Stratigraphic Correlation is a major compilation which should be read by all practicing biostratigraphers. The explanations are clear and typically complete, and enough worked examples are included so that most individuals should be able to follow the basic methods and concepts. Even if most of the numerics are ignored, the underlying principles of the RASC (Ranking And SCaling) and CASC (Correlation And SCaling in time) algorithms can be grasped. There is also much here for the most quantitative biostratigrapher. Although, I have published papers on this subject dating back to 1978, I learned some new material and thus benefited from the book. My major criticism is a relatively high cost of $107.75 (U.S.). Consequently, Automated Stratigraphic Correlation will tend to be purchased by libraries rather than by practicing biostratigraphers, which is where it really belongs. However, the book is available to members of the International Association of Mathematical Geology at a 20% discount for a price of $86.25 (U.S.) if prepaid in Amsterdam before 31 August 1991. JAMESC. BROWER Heroy Geology Laboratory Syracuse University Syracuse, NY 13244-1070, U.S.A.

Regression Estimators; A Comparative Study by Marvin H. J. Gruber, 1990, Academic Press, Inc., Boston, xi+347p., ISBN 0-12-304752-8, $49.95 (U.S.). The theory and applications of regression analysis have been expanded tremendously during the past 20

Reviews years. This book is an up-to-date comparative study of a number of different estimators for estimating the parameters fli (i = 1, 2 . . . . . m) of the linear regression model in which the dependent variable is expressed as a linear function of m explanatory variables plus a random variable. The linear model can be written in the form Y = X f l + e where X i s a known n x m matrix of rank s ~
1

TV(bR) = Trace D(bR) = ~2 ~ 2, + k i=l

where the 2; are the nonzero eigenvalues of X ' X . If there is near multicolinearity, one or more of the eigenvalues are small. In general, TV(bR) is considerably less than TV(b) which satisfies the preceding equation with k = 0. Also, for the proper choice of k, the mean square error (MSE) of the ridge estimator MSE(bR) = E(bR - fl)'(bR - fl) is less than MSE(b) = E(b - fl)'(b - fl) of the LS estimator. On the other hand, b R is a biased estimator of fl whereas b is unbiased. Other alternatives which were discovered more or less independently are the Bayes, generalized ridge, mixed, and minimax estimators. The Bayesian approach assumes that the parameters being estimated have a prior distribution. The other types of estimators make use of inequality constraints and additional observations. In general, the alternative estimators are more efficient than the LS estimators in that their MSE is less. As pointed out by Gruber,

857

these alternative estimators have many similarites and some are special forms of others. For example, the ridge estimator can be viewed as an LS estimator where additional "fictitious observations" are taken. The mixed estimator is an LS estimator for a linear model augmented by taking additional observations. Surprisingly, the mixed estimator, as well as other estimators also can be regarded as special types of Bayes estimators. The book consists of the following chapters: (I) Introduction; (II) Mathematical and statistical preliminaries; (III) The estimators; (IV) How the different estimators are related; (V) Measures of efficiency of the estimators; (VI) The average MSE; (VII) The MSE neglecting the prior assumptions; (VIII) The MSE for incorrect prior assumptions; (IX) The Kalman filter; and (X) Experimental design models. The last two chapters contain applications of the results of Chapters III-VII to Kalman filters and analysis of variance (ANOVA). The Kalman filter differs from the conventional linear model because its parameters are not constant but change with time. It was introduced in the 1960s in journals primarily read by control engineers. In successive sections, Gruber formulates the Kalman filter as a Bayes, mixed and minimax estimator, and explains its relationship with the generalized ridge regression estimator. In the Bayesian version of the Kalman filter, estimators at time t are used as prior information for the estimation of parameters at time t + 1. This book is a tour de force on relationships between estimators. In addition to the theoretical discussions, it contains many worked numerical examples and about 100 exercises. The author has written it for an audience primarily consisting of professional statisticians and users of ridge-type estimators. Relatively few earth scientists belong to the latter category. Because of the clarity of its exposition, the volume also should appeal to a wider group of users of regression analysis. However, it should be kept in mind that the book is not concerned with some other important aspects of regression analysis such as nonlinear regression, variable selection, regression diagnostics, and nonparametric regression. Geological Survey o f FREDERIK P. AGTERBERG Canada 601 Booth Street Ottawa, Ontario, Canada KL4 0E8