Original Research Paper
n
Chemometrics and Zntelligeni Laboratory Systems, 2 (1987) 221-232 Elsevier Science Publishers B.V., Amsterdam - Printed in The Netherlands
Predicting Oil-Well Permeability and Porosity from Wire-Line Petrophysical Logs A Feasibility Study Using Partial Least Squares Regression KIM H. ESBENSEN * and HARALD MARTENS Norwegian Computing Center, Forskningsoeien I B, Postboks I I4 Blindem, 0314 Oslo 3 (Norway)
ABSTRACT Esbensen,
K.H.
H, 1987. Predicting oil-well permeability and porosity from wire-line feasibility study using partial least squares regression. Chemometrics and Intelligent
and Martens,
petrophysical logs-a
Laboratory Systems, 2: 221-232.
A method for predicting permeability, porosity and other petrophysical core parameters directly from wire-line logs in oil wells is reported, based on multivariate calibration using principal component analysis and partial least squares (PLS) regression. The method is applied to data from a single North Sea oil well over a depth interval of 125 m. Samples with log parameters indicating abnormal lithologies or with excessive missing data were rejected as “outhers”. Calibration was carried out by relating a set of seventeen non-selective, in situ wire-line petrophysical log variables from various depths in the well to petrophysical laboratory determinations subsequently performed on drill core samples from the corresponding depths. Calibration modelhng was effected by treating the samples both as a single lithology class (reservoir sand) and as high and low permeability classes (coarse and fine sands). Various linearity transformations of the input data were tested; (permeability)‘/3 was found to be optimal. The predicted permeability and porosity closely covary down the entire well with the core control data. Approximately 50% of their total data variance was correctly reproduced; the same stratigraphic horizons were identified. The residual variance is due to measurement error, shift discrepancies and model errors (non-linearities and data subgroupings). The PLS model was found to be statistically stable by two different validation methods (leverage correction and cross-validation). This study demonstrates in situ wall-rock parameter prediction from core calibration data in one well, and simultaneously simulates inter-well prediction for other non-cored wells. PLS prognosis for oil/gas wells appears feasible at the present precision level of log and laboratory measurements and uncertainty in core-depth assignment.
INTRODUCTION Multivariate
calibration
Chemistry and geochemistry are two prominent fields of application of multivariate calibration, 0169-7439/87/$03.50
0 1987 Elsevier Science Publishers B.V.
which has also seen important theoretical developments in the last decade [l-4]. Much interest has centred around the so-called partial least squares (PLS) regression method(s). There are a number of different possible approaches to multivariate calibration [5,6]. PLS re221
w
Chemometrics and Intelligent Laboratory Systems
gression is primarily characterized by conceptual simplicity, an ability to give reasonable modelling when both the regressands Y and the regressors X contain much noise and by practical success in multivariate calibration [4]. The mode A multifactor PLS regression with orthogonalized scores [3] is a special version of Weld’s more general PLS principle for analysing systems under indirect observation [7]. The PLS principle is to minimize several simple sums of squares of the residuals between parts of the data and corresponding parts of the data structure model. The “soft modelling” PLS regressions are usually less difficult to employ than regressions and systems modelling fundamentally based on the maximum likelihood principle [7]. Contrary to standard multiple linear regression (MLR), but in analogy with the well established multivariate calibration method of principal component regression (PCR), PLS regression explicitly accounts for multicollinear X-variables (and Yvariables) with strong measurement errors. This is important, for instance, if there are more X-variables than there are objects. However, both PCR and PLS regression would give the MLR solution if the matrix of X-variables can be regarded as error free and without multicollinearity and hence modelled with full column rank. In contrast to PCR however, PLS regression also makes explicit use of the Y-data in order to optimize the relevance of the latent variables estimated from the X-data; this tends to give simpler models that often are easier to interpret and often have better prediction abilities [4,6]. Present study A feasibility study has been carried out on the potential for PLS prediction of the petrophysical core parameters permeability and porosity from associated wire-line well logs. We simulate the situation of prediction of permeability and porosity depth patterns in newly drilled wells directly from well log data from a calibrated well with at least one laboratory drill-core data set. Our objectives are as follows: (1) Modelling the two core parameters permeability and porosity from the available wire-line 222
geophysical well log parameters; a second objective is assessing eight other laboratory core parameters in relation to both predicting well logs and predicted permeability and porosity variables. GDAM and SIMCA methodology is used for the purpose of outlier rejection and major facies delineation [1,2]. (2) Evaluation of the potential for PLS prediction of this screened data set. Both modelling and prediction assessment are carried out by multivariate calibration using the PLS regression technique [3,4] as implemented in the UNSCRAMBLER program *. (3) Graphical presentation of the statistical results to facilitate dectection of the geologically most relevant results. Graphical presentation of the prediction results also to facilitate comparison with the reference parameters (core variables).
DESCRIPTION
OF DATA SET
The data consist of 518 depths from a test well in the Norwegian North Sea, spanning the general depth interval 1600-1750 m in a reservoir rock zone. Samples from a slightly smaller interval, totalling 486 depths, were all from a watersaturated zone; these constitute the basis for the present study. Table 1 lists the variable designations used in the established petroleum geology terminology. The X-variables are in situ wire-line logs upon which the prediction relationship is built (the Xspace in regression terminology). The Y-space is made up of the petrophysical variables, obtained by slower and more expensive laboratory examinations of core samples. The log parameters consist of 20 variables, including the depth assignment in the well, z. X-variables 18, 19 and 20 are petrophysical standard log-derived estimates of porosity, permeability and bulk density, respectively. These are specifically not used in the final calibration below. The variable designations in the statistical plots are also given in Table 1. Table 2
UNSCRAMBLER User’s Manual, CAM0 5115, N-7001 Trondheim, Norway
*
A/S,
P.O. Box
Original Research Paper
n
TABLE 1 Variable designations The letters correspond to standard petroleum geology terminology. Variable No.
X-variable
Plot designation
Variable No.
Y-variable
Plot designation
1 2 3 4 5 6 I 8 9 10 11 12 13 14 15 16 17 18 19 20
2 (depth of sample) RT PHIN RHOB DT GR CAL RLLS RLLD CGR SGR THOR URAN POTA RSFL RILD RMSFL DPOR DKLH DROHMA
1 2 3 4 5 6 7 8 9 A B C D E F G H I J K
21 22 23 24 25 26 27 28 29 30
DGRSZ MGRSZ SRT MICA CBNT COAL CON RDNS KLH (permeability) PORHE (porosity)
L M N 0
lists some descriptive statistics for these X and Y variables.
OUTLINE
OF DATA ANALYSIS
: R S T U
ing in the suggested use of (KLH)‘13. This was found to give satisfactory results, superior to those based on the more conventional log,, (KLH + 1) transformation. The skewness was lowered significantly by these transformations.
Data transformations Eliminating Variable 17 (RMSFL) spans the range 0.611.94. From previous geological and statistical results it is known that this is a key variable, usually considered to be log-normally distributed; the positive skewness of 2.3 supports this. Because of the low range, the transformation log,,(RMSFL + 1) is used. Variables 21 and 22 (DGRSZ and MGRSZ) also have a marked positive skewness; from previous experience the transformation log, was used. Variable 29 (KLH) (core permeability) also displays a positive skewness of about 2.1. Because of the central role played by this parameter in oil engineering, industry experience has accumulated on the distributional form of this variable, result-
outliers
For optimal prediction and interpretation it is important to calibrate for reasonably homogeneous sample sets. In this case we attempt to calibrate for normal representative sand samples; abnormal lithologies such as carbonates or otherwise erroneous data with, e.g., large depth shifts should be eliminated. Therefore, before Run A could be carried out, a full-scale outlier detection was undertaken. Details of the iterative outlier detection procedures are not reported here; the principles of the relevant SIMCA/GDAM methodology are outlined by Esbensen et al. [l] elsewhere in this issue. Four successive runs were necessary before all 223
n
Chemometrics and Intelligent Laboratory Systems
TABLE 2 Descriptive statistics for the total data set (N = 518) Variable No.
Mean
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1665.0 0.71 0.31 2.1 107.7 47.7 8.28 0.62 0.67 53.2 54.4 2.12 0.15 0.03 0.68 0.55 0.86 0.33 6830.0 2.66
21 22 23 24 25 26 27 28 29 30
378.0 2145.0 5.7 0.31 0.09 0.27 1.65 2.6 7438.0 0.33
SD.
Minimum
Maximum
Skewness
37.0 0.13 0.03 0.03 4.8 9.8 0.03 0.14 0.12 10.6 10.4 1.4 0.38 0.004 0.15 0.15 0.22 0.03 7106.0 0.02
1602.0 0.34 0.24 2.05 90.8 31.0 8.0 0.43 0.47 40.0 38.9 0.07 - 2.27 0.019 0.52 0.38 0.61 0.03 0.01 2.56
1730.0 1.23 0.4 2.24 116.6 82.0 8.37 1.29 1.25 90.0 93.0 8.17 1.36 0.04 1.47 1.45 1.94 0.38 50677.0 2.78
0.03 1.18 0.04 0.55 -0.55 1.1 - 4.8 2.04 2.01 1.55 1.37 1.86 - 0.7 0.81 2.16 2.6 2.28 - 4.21 2.25 1.3
209.0 1778.0 1.64 0.6 0.35 0.48 0.57 0.52 7836.0 0.03
34.0 151.0 2.0 0.0 0.0 0.0 1.0 2.0 0.0 0.03
3000.0 15000.0 8.0 3.0 3.0 2.0 4.0 4.0 50677.0 0.38
6.4 2.06 -0.9 2.04 5.15 1.51 0.27 -0.13 2.1 -4.15
major outliers were identified. It is geologically significant that the bulk of the outliers that were individually detected in these consecutive GDAM runs show up in coherent stratigraphic intervals (Fig. 1). This gives confidence that the statistical outlier criterion has a definite corresponding geological basis. More subtle outlying samples were detected by the subsequent UNCSRAMBLER PLS prediction runs, some of which were deemed large enough to cause problems in the modelling/prediction phase; they were also removed. It is again geologically important to note that these samples showed up as “shoulders” to the already delineated outlier intervals. Fig. 1 details the final outlier assignments totalling 133 depths. 224
Multivariate
PLS regression
A model is now desired for predicting permeability, porosity and the other eight core-variables in Y (variables 21-30, Table 1) from the wire-line logs in X (variables 2-17) for the type of rock material selected: Y = f(X). We chose the PLS regression (see the Introduction). Four PLS regression runs were carried out, designated A-D. A summary of these runs is given below. Run A An initial data analysis (PLS prediction)
using all X- and Y-variables in untransformed form, aimed at elucidating the basic data structure in the
Original
Depth
Depth
TABLE
Research
Paper
H
Y-variables
in
3
Prediction residuals of the individual Run B ( = C) after four PLS factors
X- and
The second column gives the residual variance (mean square error) as a percentage of the total variance of each variable. The third column expresses the residual in absolute terms as the error standard deviation (obtained by de-weighting of the square root of the second column). The last column shows the corresponding total standard deviation.
I 2
Variable
% prediction error variance
Absolute error S.D. of PLS modelling
Total SD.
16 0.061 0.011 0.013 1.6 3.3 0.0072 0.036 0.030 1.5 2.3 0.68 0.30 0.0018 0.038 0.048 0.024
37 0.12 0.029 0.040 4.7 9.5 0.015 0.13 0.11 10.4 10.5 1.34 0.36 0.0041 0.13 0.12 0.084
0.47 0.72 1.3 0.50 0.20 0.43 0.51 0.48 3.9 0.017
0.63 1.2 1.6 0.65 0.20 0.50 0.56 0.51 5.2 0.024
x:
Depth
1
Z Fig. 1. Schematic stratigraphic pattern of the 133 outliers (black along the length of the drill hole in the water saturation zone.
total data set. Interpretations based on Run A suggested improvements, especially rejection of influential outliers. The effects of the above transformations were also optimized at this stage. The most critical finding directly bearing on Run B was the marked effects due to the large
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 28 15 11 12 12 24 8 7 2 5 26 74 20 8 15 8
y: 21 22 23 24 25 26 27 28 29 30
56 34 61 60 100 73 81 92 56 51
effective ranges for the variables permeability and porosity. Permeability values higher than 10,000 and porosity values lower than 0.20 are considered to be geologically extreme. In Run A samples with these extreme values invariably emerged as outliers in the PLS analysis as judged from their leverages and residual X and Y variances [1,4]. We consequently deleted such samples (which amounted to only fifteen samples). 225
W Chemometrics
and Intelligent
Laboratory
Systems
Run B
PLS modelling results are given in Table 3.
This was our final PLS modelling study, with optimized transformation, variable selection and outlier rejection. Run B presents the final feasibility results; several intermediate runs were carried out but are not reported here.
Factor No.
Run C
This was a cross-validation prediction assessment based on Run B; the same modelling characteristics were used in Runs B and C.
into two subsets, which are then PLS modelled separately. Each set apparently corresponded to a specific lithological entity (fine and coarse reservoir sand respectively). This run mainly served to characterize multivariate correlations of each lithology for comparison with Run B.
THE
MAIN
RESERVOIR
Y (total)
Y9 (permeability)
Ylo (porosity)
(W
6)
(I)
6)
1 2 3 4
43 18 14 8
45 0 1 0
0 49 0 0
23 8 1 1
Total
83
46
49
33
Fig. 2 shows the PLS component scores t, vs. t, of the 353 reservoir samples after the outlier rejection. The stratigraphic positioning of these samples can be seen in Fig. 1. These disjoint stratigraphic sand intervals are treated collectively here as their multivariate characteristics clearly form a continuum. The cresent-shaped point distribution is less marked than the equivalent for Run A, presumably because of the Run B transformations employed. The remaining non-linearity can reflect either minor skewed distributions for one or more of the variables with respect to the main factors, or be due to more complex reservoir lithology. Based on the curved trend in the score plot of the two first factors in Run B (Fig. 2), this population was partitioned at a point corresponding to
Run D This involves a splitting of the Run B data set
RUN B: MODELLING THOLOGIES
X (total)
LI-
The proportion of initial variance explained by the first four PLS components from the seventeen log variables are given below. More details of the
4-
PC2
+ +
+
3-
+
I
+
2-
++
c +
1-
*
O* -,-
l
+
l
+
+
++
2-
+
+ + ++ + + + + ++ + ++++ ’
+
-3-
++
.,I
l
4
c
+
++
l
+ +
4
l
+ +
t
++ +
+
+* + + +++ + + +++: + ++ + + +‘I+
+
-4-5-6
+ -11
-10
-9
-6
-7
-6
-5
-4
-3
-2
-1
0
1
2
3 PC,
Fig. 2.
226
Distribution
of the 353 remaining
main reservoir
sand samples
along the first two PLS factors,
f2 vs. tl.
4
Original Research Paper
w
N
E
w6
0 P
0 c
K
a
s
T
L
Fig. 3. Distribution of the X- and Y-variables (variables 1-K and L-U
in Table 1) along the first two PLS factors, loadings, pz vs.
PI.
score t, = 0 and the two sub-sets of samples modelled separately (Run D). From geological and geophysical considerations these two lithologies can be interpreted as fine and coarse reservoir sands, respectively. Fig. 3 shows the variable loadings p2 vs. pl, corresponding to the objects’ scores (Fig. 2). From these figures the following conclusions may be drawn: (i) Permeability (Y-variable T) in these two dimensions is correlated with DGRSZ (L) and MGRSZ(M), with high positive loadings on the first component. (ii) In the direction of the second facor, X-variable 4 (RHOB) (bulk density) is negatively correlated with X-variables (35) (PHIN,DT) and Yvariable 30 porosity (U), which display high positive loadings. (iii) The remaining Y-variables do not display high loadings on either component, except (0, Q) (MICA, COAL), which correlate negatively with the first factor, and (N) (SRT), which correlates positively with the second factor. It is of interest that the first factor correlates
dominantly with permeability, whereas the second factor correlates with porosity. This means that the factor scores t1 and t, correlate more or less directly with these target core parameters. Other details of the geological interpretations of the relationships found in the data are not given here. For geologists, univariate depth plots may be the easiest way of visualizing the results. Figs. 4 and 5 show depth plots of these t1 and t, scores and their calibration core parameters permeability and porosity. Two findings are noteworthy from these plots: (i) The depth patterns for both scores correspond well with those of their relevant core parameters. Almost exactly the same stratigraphic intervals are delineated. (ii) The scores constitute a damped counterpart to their much more variable calibration parameters. This damping encompasses both measurement/shift uncertainty and model errors. The scores reproduce 50% of the signal variance in these reference parameters, while removing a similar amount of residual variance (noise). On the whole, both of these plots provide evi221
n
Chemomettics
and Intelligent
Laboratory
Systems
22
(KLH)‘”
Fig. 4. Measured core permeability represent the outliers.
(KLH’j3)
and the first PLS factor
dence that it is possible to model the essential information present in these two core parameters directly form the seventeen in situ petrophysical wire-line logs. In Run C we explain why the present depth modelling also can be considered as valid prediction patterns for the simulation of inter-well calibration.
score, t,, plotted
as functions
of depth,
Z. Asterisks
In order to substantiate further the claim that these scores can be considered as approximate predictors for the core parameters permeability and porosity, we also present bivariate plots of predicted vs. measured porosity and permeability in Figs. 6 and 7. Both display a significant and reasonably linear relationship, which can be sum-
I
4
t2
at 1, = 0
3
04
-F ‘ORHE
2
1
0
-1
-2
-3
-4
-5 -6
Depth, 2 -
Fig. 5. Measured core porosity (PORHE) and the second PLS factor score, tz, plotted as functions of depth, Z. Asterisks represent the outliers. The latter curve has been shifted vertically in order to allow comparison of their similar patterns. 228
at ?a = 0
n
Original Research Faper
SW0
5oQo
4&J
3ow l
<
.
+ 2ooo
+*
t
1
l
. *
&
.
*
loim
0
Fig. 6. Comparison model (ordinate).
of measured
core permeability
(abscissa)
with the permeability
marized by their correlation coefficients, r = 0.68 for permeability and 0.70 for porosity. Note that there is an approximately 50% scatter around these
predicted
from the wire-line
logs via the PLS
first-order linear relationships, corresponding the above-mentioned residual variance.
t
0.3630 2 l
to
*
* * *
0.3440 -
L
. 0.3250
.
-
.f t
I
. i
.* t.
:
I.
t : *
t +
l
E
t
E l
l
L I l
: i+
+
+ I
f
03050-
:
+
+ +
l
Fig. 7. Comparison (ordinate).
of measured
core porosity
(abscissa)
with the porosity
predicted
from the wire-line
logs via the PLS model
229
n
Chemometrics
and Intelligent
RUN C: STATISTICAL ELLING
Laboratory
ASSESSMENT
Systems
OF THE PLS MOD-
It is dangerous to estimate model parameters from a certain calibration data set and then test the predictive ability of the resulting model on the same data set without any special precautions. Spurious effects in the data can affect the modelling, and the testing may therefore lead to overoptimistic conclusions about the quality of the obtained model (over-fitting). In this study we have given residual variances in the form of estimated prediction variances and not as ordinary lack-of-fit residual variances. This has been done by a new and efficient estimation procedure based on “leverage correction” of the calibration set residuals. It is based on a transformation of the lack-of-fit residuals that compensates for the modelling influence from outliers or spurious variables in the calibration data. It has the advantage of allowing efficient use of all the data for model estimation, While at the same time being computationally efficient. This method is now used routinely for’ validation in the UNSCRAMBLER program and was more fully described by Martens [4]. However, as this validation method is relatively new, we compared it with the more conventional full cross-validation method [8]. In cross-validation, model estimation is performed on a reduced data set: parts of the data are left out for completely independent prediction testing of the model obtained from the remainder of the data. This procedure is repeated several times, each time predicting a different subset of the data until all data vectors have been left out once. Results from the two validation methods are given in Table 4, which shows the mean-squared error between the predicted and the measured values for the ten Y-variables as a percentage of their total variances in the data set. Table 4 shows that the two methods for prediction testing give very similar results, as expected in this situation. The large number of samples in the present calibration data set is in itself a guard against effects from modelling any outliers that may still reside in the data. Table 4 suggests that the Run A outlier rejections have been satisfactory. More230
TABLE
4
Comparison of two validation methods (Run C): prediction error of the ten core variables estimated by leverage-corrected calibration residuals and by cross-validation Cross validation was performed four times, each time predicting 25% of the data. The averages of these four predictions tests are given together with the standard deviations. Variable
21 22 23 24 25 26 27 28 29 30
Leveragecorrected
Crossvalidation
(W
(W)
56 34 61 60 100 13 81 92 56 51
56 34 62 61 103 74 82 92 56 51
;i:’ 2 1 1 3 20 4 3 3 2 2
over, it should be noted that when using laboratory core data for calibration, some of the apparent prediction error is due to noise in these input data, and not exclusively to the modelling. Hence the predictive abilities for permeability and porosity presented here can be regarded as true prediction abilities and are not functions of over-fitting; this also applies to the other core variables. Detailed description of the PLS model The two first PLS factors represent most of the predictive ability of the seventeen log variables X for the ten core variables Y, as summarized in Table 5. Even so, the third and fourth PLS factors have some predictive validity, and in addition give a minor improvement in the modelling of the log variables X. Therefore, in the more detailed study of how well the individual variables are described by the PLS model, the results from the four-factor PLS solution are used. The two-block PLS2 method [2] was used for estimating a low-dimensional linear model that predicts the ten core variables Y from the seventeen log variables X. This method optimizes the “average” fit of the model to the 27 different
Original Research Paper
TABLE 5 Relative importance of the first four PLS factors in modelling Results are given for the average of the seventeen log variables used as X and of the ten core variables used as Y. The variances of the variables after zero factors (only the averages used in modelling) are used as the baseline = 100%. The prediction variances, after subsequent factors have been estimated and subtracted, are given as a percentage of this initial variance. No. of PLS factors used in model
Average 4: residual variance X=17log variables
Y = 10 core variables
0 1 2 3 4
100 57 39 25 17
100 77 69 68 67
variables. Among these X- and Y-variables some may be better modelled than others. Table 3 shows the estimated prediction errors for the individual variables after fitting to the four-factor PLS model. These error levels should be compared with the individual measurement errors of both the X- and Y-variables. Note that for transformed variables these errors appear in their transformed units. Table 3 shows that among the core variables Y, MGRSZ (No. 22) is predicted best from the seventeen log variables, while CBNT (No. 25) and RDNS (No. 28) are not predicted at all. Among the log variables X, variables 4, 5, 6, 8,
l
9, 10, 11, 15 and 17 are well modelled, with CGR (No. 10) showing the lowest relative error variance. Log variable URAN (No. 13) is the only X-variable not well described by the four-factor PLS model.
RUN D: SEPARATE LOW-PERMEABILITY
MODELLING LITHOLOGIES
OF
HIGH-
AND
The score plot (Fig. 2) in Runs A and B showed a crescent-shaped relationship in the log data X. In order to investigate the nature of this structure in more detail, the two parts of this data swarm were analysed separately by a PLS analysis with the characteristics of Runs B and C. The sample set was split into two subsets, one with samples having positive scores t, and the other with negative scores. These two subsets [Dr (t, 2 0: high permeability) and D, (ti < 0: low permeability] were independently analysed. D, apparently corresponds to reservoir zone coarse sands and D, to fine sands. Table 6 lists the results of these D, and D, runs. In the high-permeability subset (Di) the permeability is almost constant. Hence the fourcomponent PLS model will not describe a significantly larger fraction of this variation than a zero-component model, corresponding to this average value. However, in absolute terms, the prediction error is about the same as in the lowpermeability subset (D,) and in the total Run B.
TABLE 6 Permeabitity and porosity modelled in high- and low-permeability samptes separately D, designates high- and Dr low-permeability samples. Permeability values are in transformed units. X-variables: 1-K (see Table 1); Y-variables: L-U (see Table 1). Variable
Average
‘R,prediction error variance.
Absolute error S.D. from PLS modelling
Total SD.
14.1 16.4 10.2
56 97 53
3.9 3.6 3.7
5.2 3.7 5.1
51 42 68
0.017 0.015 0.022
0.024 0.023 0.027
29 (Permeability) Run B Run D, Run D, 30 (Porosity) Run B Run D, Run Da
0.33 0.33 0.33
231
n
Chemometrics
and Intelligent
Laboratory
Systems
For the present evaluation of Run D we restrict ourselves to the direction of the first factor. Here we observe a complete reworking of the variable inter-relationships for Dr. Porosity is strongly correlated with PHIN, DT, GR, CGR, SGR and POTA, which in turn display a marked negative correlation with MGRSZ, RT, RHOB, RLLS, RLLD, RSFL and log,, (RMSFL). The permeability confirms the low prediction ability, displaying a small loading for factor 1. For D, we observe a significant similarity with Run B results for most variables, which is as expected because of the large variation displayed by the D, leg of the total t,, score plot for Run B (Fig. 2). Overall a tendency towards similarity between the Run B factor 2 and Run D, factor 1 is observed, e.g., a porosity RHOB negatively correlated along factor 1 in Run D2. DISCUSSION
The depth patterns delineated by the core parameters permeability and porosity (Figs. 4 and 5) are closely modelled and predicted by the present PLS analysis, as the same stratigraphic intervals are outlined. Statistical assessment of the PLS model substantiates that the ca. 50% of the data variance modelled represents a valid and significant damped prediction of the geologically relevant stratigraphic succession of intervals with generally high and low permeability and porosity, respectively. The aim here has been to delineate the same stratigraphic intervals as do the reference core parameters, rather than to be able to model completely all the variance in the raw original laboratory data. Table 5 and Fig. 3 show that the laboratory variables (N, P, R, S): (SRT, SBNT, CON, RDNS) are not predictable with the presently restricted set of two (or four) Y-space PLS components. It is possible that a more comprehensive model may improve on this or, alternatively, show that the X-data are irrelevant in this context. Where prospective drilling is being actively carried out and where one well has been calibrated with a suitable laboratory data set, SIMCA/PLS methology has significant potential for easy and quick assessment of both permeability and porosity directly from wire-line logs. Water saturation is 232
an obvious further candidate to include in the suite of prediction variables, and so are other petrophysical parameters of interest to the oilprospecting industry. ACKNOWLEDGEMENTS
This work was financed by the Norwegian oil companies Statoil and Norsk Hydro, whom we thank for their support and kind permission to publish these results. Depth assignments for specific horizons have been deliberately left out (Fig. 1). We thank Dr. Richard Eastgate, Norsk Hydro, and Dr. H.J.B. Birks for constructive criticism of the manuscript. The Editor, 0. Kvalheim, deserves much praise for his high standards. We express our sincere thanks to Ellen Sandberg for expert help with the data modelling and Berit 0stby and Bodil Engejordet for excellent typing and editing.
REFERENCES K. Esbensen, L. Lindqvist, I. Lundholm, D. Nisca and S. Weld, Multivariate modelling of geochemical and geophysical exploration data, Chemometrics and Intelligent Laboratoy Systems, 2 (1987) 161-175. S. Wold, C. Albano, W.J. DUM, III, U. Edlund, K. Esbensen, P. Geladi, S. Hellberg, E. Johansson, W. Liidberg and M. Sjostriim, Multivariate data analysis in chemistry, in B.R. Kowalski (Editor), Chemometrics, Mathematics and Statistics in Chemistry, Vol. 138, Reidel, Dordrecht, 1984, pp. 17-95. S. Wold, H. Martens and H. Weld, The multivariate calibration problem in chemistry solved by the PLS method, in A. Ruhe and B. K%gstro (Editors). Proc. Cant on Matrix Pencils, Pite& Sweden, Lecture Notes in Mathematics, Springer, Heidelberg, 1983, pp. 286-293. H. Martens, Multivariate Calibration - Quantitative interpretation of non-selective chemical data, Dr. Techn. Thesis, NTH, Trondheim, Norway, 1985; 2nd printing, NCC Reports 786 (Part I) and 787 (Part If), Norwegian Computing Center, Oslo, 1986. P.J. Brown, Multivariate calibration, Journal of the Royal Statistical Society, Series B, 44 (1982) 287-308. T. Noes and H. Martens, Comparison of prediction methods for collinear data, Communications in Statistics, 14 (1985) 545-576. K.G. Joreskog and H. Wold (Editors), Systems under Zndirect Observation, Causaliry -structure-prediction, Vols. I and II, North-Holland, Amsterdam, 1980. S. Wold, Cross validatory estimation of the number of components in factor and principal components analysis, Technometrics, 20 (1978) 391-406.