Multiple Regression
3 MULTIPLE REGRESSION This section sums up Douglas Montgomery’s [13] work on multiple regression and basic definitions and deductions in the bases of statistics [14]. The aim of curve fitting is to express the relationship between two or more variables in mathematical form by determining the equation connecting the variables. If we feel that there is a linear relationship between the dependent variable y and independent variables x i (i=1,2,…m), then we would seek the equation connecting the variables which has the form:
y = a0 + a1 ⋅ x1 + a2 ⋅ x2 + K + am ⋅ xm + ε ,
(3.1)
where the unknown parameters {a i } are regression coefficients, and ε represents the random errors. The regression coefficients {a i } are obtained with the least squares method. Equation (3.1) represents the plane in the m-dimensional rectangular coordinate system. Our assumption is that the number of equations is greater than the number of regressor variables n > m, x ij . The data are shown in Table 3.1. The estimation of the procedure requires the mathematical expectation of random errors to be E[ ε ]=0, the variance to be E[ ε ] 2 = σ , and that { ε } are not correlated with each other. In this Table 3.1. Data for linear multiple regression [13]
Y
x1
x2
…
xk
y1
x11
x21
…
xm1
y2
x12
x22
…
xm2
M
M
M
M
M
yn
x1n
x2n
…
xmn
19
Mineral Wool
case, we can write a model with the form:
y = a0 + a1 ⋅ x1 j + a2 ⋅ x2 j + K + am ⋅ xmj + ε j
(3.2)
Equation (3.2) can be written also in the matrix notation:
y = xa + ε (3.3) where y is (n x 1) vector, x is (n x m) matrix, a is (m x 1) vector of regression coefficient and ε is (n x 1) vector of random errors. ⎡ y1 ⎤ ⎢y ⎥ y = ⎢ 1⎥ ⎢ M⎥ ⎢ ⎥ ⎣ yn ⎦
⎡1 x11 ⎢1 x 12 x=⎢ ⎢M M ⎢ ⎣1 x1n
xm1 ⎤ xm1 ⎥⎥ M M⎥ ⎥ L xmn ⎦
x21 K x22 L M x2 n
⎡ε 0 ⎤ ⎢ ⎥ ε ε=⎢ 1⎥ ⎢ M⎥ ⎢ ⎥ ⎢⎣ε m ⎥⎦
⎡ a0 ⎤ ⎢ ⎥ a a=⎢ 1⎥ ⎢ M⎥ ⎢ ⎥ ⎢⎣ am ⎥⎦
The vector of the regression coefficient a is found by the least square method: n
L = ∑ ε 2j = εT ε = (y − xa)T (y − xa) j =1
(3.4)
The notation T denotes a transpose matrix or a vector, for example vector a and its transposing vector a T :
⎡ a0 ⎤ ⎢ ⎥ a a=⎢ 1⎥ ⎢ M⎥ ⎢ ⎥ ⎢⎣ am ⎥⎦
and aT = [ a0 , a1 , a2 ,L am −1 , am ]
The least squares estimators must satisfy the equation:
∂L = −2xT y + 2xT xa = 0 , ∂a
(3.5)
which simplifies to
xT xa = xT y ,
(3.6)
20
Multiple Regression
To solve the equation (3.6), we multiply both sides by the inverse of the matrix product xT xa = (xT x) −1 xT y . The least square estimator of a is:
xT xa = (xT x) −1 xT y ,
(3.7)
or in the explicit matrix form:
⎡n 0 ⎢0 S 11 ⎢ ⎢ 0 S12 ⎢ ⎢M M ⎢⎣ 0 S1m 3.1
0 S12 S22 M S2 m
⎡ n ⎤ 0 0 ⎤ ⎡ a0 ⎤ ⎢ ∑ yi ⎥ i =1 ⎢ ⎥ ⎥ L S1m ⎥⎥ ⎢ a1 ⎥ ⎢ ⎢ S1 y ⎥ L S2 m ⎥ ⎢ a2 ⎥ = ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ S2 y ⎥ M M ⎥⎢ M⎥ ⎢ M ⎥ L Smm ⎥⎦ ⎢⎣ am ⎥⎦ ⎢ ⎥ ⎢⎣ Smy ⎥⎦
(3.8)
Hypothesis testing in multiple linear regression
In multiple linear regression, we wish to test hypotheses about model parameters a. It is important to consider testing for the significance of regression. In multiple linear regression, this is accomplished by testing [13]:
H 0 : a1 = a2 = L = am = 0 H1 : ai ≠ 0
(3.9)
at least one i
The rejection of H 0 in equation (3.9) implies that at least one variable in the model contributes significantly to the fit. The total sum of squares is portioned into regression and error sums of squares S yy = SS R + SS E where SS R and SS E are: m
m
SS E = ∑ ( yi − yest ) 2 ;
SS R = ∑ ai yi ( xij − x) 2 ,
i =1
i =1
(3.10)
The test procedure for H 0 : a i = 0 is the computation:
F0 =
SS R ( n − m − 1) MS R = m SS E MS E
(3.11)
and the rejection of H 0 if F 0 > F a,m,n–m-1 . The procedure is usually summarized in the analysis of variance tables, such as Table 3.2 [13]: We are frequently interested in testing hypotheses on individual regression coefficients. The hypotheses for testing the significance 21
Mineral Wool Table 3.2. Analysis of variance table S o urc e o f Va ria tio n
S um o f S qua re s
D e g re e s o f Fre e do m
M e an S qua re
Le ve ls
Fis he r Sta tis tic s F0
Re gre ssio n
SSR
K
MS R
Erro r o f re sid ua l
SSE
n -m -1
MS E
To ta l
S yy
n -1
M S E /M S R
of any individual coefficient, for example a i , are:
H 0 : ai = 0
(3.12)
H1 : ai ≠ 0
The appropriate test statistics are:
ai MS E Cii .
t0 =
(3.13)
Hypothesis H 0 : ai = 0 is rejected if t0 > tα / 2,n − m−1 . 3.1.1 Coefficient of determination The coefficient of determination is the quadratic value of the correlation coefficient r and is defined as [13, 14]: m
r2 = 1−
∑(y − y
( est ) i
i
i =1
m
∑ ( y − y) i =1
m
)2 =
2
i
∑(y i =1 m
est
− y )2
∑ ( y − y) i =1
= 2
Explained variation Total variation .(3.14)
i
The coefficient of determination r 2 can be interpreted as the fraction of total variation which is explained by the least squares regression line. This means that r measures how well the least squares regression line fits the sample data. r 2 lies between 0 and 1, r 2 ≥ 0. The definition (3.14) holds for non-linear correlation as well. Important statistics are also the adjusted statistics r 2 :
⎛ n −1 ⎞ 2 2 = 1− ⎜ radj ⎟ (1 − r ) ⎝n−m⎠ The advantage of adjusted statistics r 2 is that they do not 22
Multiple Regression
automatically increase if a new variable is added to the model. 3.1.2 Other linear and non-linear models The linear model y = xa + ε is a general model. It can be used for adapting the linear relations of unknown parameters a. By transforming the equations, it is in many cases possible to form a linear model. The most frequent expression in empirical correlations is the non-linear relation which represents the product of particular non-linear connections.
Y = a0 X 1a1 X 2a2 L X iai X ia+i1+1 X mam ,
(3.15)
The equation (3.15) can simply be transformed into a linear model. If the variables X 1 , X 2 , ... X m are independent of each other, we can take the logarithm of the equation (3.15).
y = log Y = log(a0 X 1a1 X 2a2 L X mam−−11 X mam ) = = a0 + a1 log X 1 + a2 log X 2 + L + am−1 log X m−1 + am log X m = (3.16) = a0 + a1 x1 + a2 x2 + ... + am−1 xm−1 + am x6 m Final form of the equation (3.16) is simple and represents the linear relation of the transformed value y to the transformed values x 1 , x 2 , .... x m . Indeed, the equation (3.15) can be solved also as a non-linear equation. In this case, we use the gradient methods, most frequently the Levenberg–Marquardt method. In order to successfully solve a non-linear system of equations, properly chosen values of parameters a i are essential. The latter can more easily be determined by the solution of a linear problem. 3.1.3 Computer printout Computer programmes are used extensively in regression analysis. The output from such a programme, SPSS [15], is shown below. First, we have to determine the input variables. A computer printout can appropriately be presented with the practical example of six independent variables. A regression equation can be expressed as:
dV = a0 Π1a1 Π 2a2 Π 3a3 Π 4a4 Π 5a5 Π 6a6 .
(3.17)
The regression equation (3.17) can be transformed into the form of equation (3.16). In this manner, a linear model can be acquired: 23
Mineral Wool y = log dV = log( a0Π1a1 Π 2a2 Π 3a3 Π 4a4 Π 5a5 Π 6a6 ) = = a0 + a1 log Π1 + a2 log Π 2 + a3 log Π 3 + a4 log Π 4 + a5 log Π 5 + a6 log Π 6 = (3.18) = a0 + a1 x1 + a2 x2 + a3 x3 + a4 x4 + a5 x5 + a6 x6
We can designate the equation (3.18) as Model 1. Table 3.3. M ode l
Variable s Ente re d
Variable s Re move d
1
X6, X1, X5, X4, X3, X2
M e thod Enter
All requested variables entered. Dependent Variable: Y Let us take a look at the computer printout of six independent variables. On the basis of the computer printout we first obtain information about the correlation coefficient r, coefficient of determination r 2 , adopted coefficient r 2 and the standard error of estimate (SEE). This is presented in the table below. r 2 shows that the model explains almost 88 % of joint variance because it amounts to r 2 = 0.878. Model summary Table 3.4. M o de l
R
R S qua re
Adjus te d R S qua re
Std. Erro r o f the Es tima te
1
0.938
0.878
0.852
2 . 2 6 7 E- 0 2
a Predictors: (Constant), X6, X1, X5, X3, X2, X4 b Dependent Variable: Y The variance of analysis helps us to ascertain if the regression model is statistically significant. Table 3.5. M ode l Regression Residual Total
Sum of Square s
De gre e s of Fre e dom
M e an Square
F
Sig.
0.107
6
1.781E- 02
34.647
. 0 00
1.491E- 02
29
5.141E- 04
0.122
35
a Predictors: (Constant), X6, X1, X5, X3, X2, X4 b Dependent Variable: Y 24
Multiple Regression
At the end, we check the importance of particular predictors for explaining the regression criterion. For this purpose, we use the statistic t-test. Since the t-test can be misleading in individual cases, we perform regression analysis in stages by adding variables one after another untill they can contribute to explaining the criterion. Data in the table below leads us to the conclusion that all predictors are statistically characteristic and very important. Least important are coefficients a 4 and a 6 , but their importance is still great. Coefficients Table 3.6.
N o ns ta nda rd Co e ffic ie nts
Sta nda rd Co e ffic ie nts
t
S ig .
1.995
0.056
a
Std. Erro r
Co ns ta nt
a0 = 145.789
73.085
X1
a1 = –0.353
0.167
–0.532
–2.106
0.044
X2
a2 = –0.266
0.162
–0.422
–1.641
0 . 11 2
X3
a 3 = 2 11 . 11 3
104.867
0.384
2.013
0.053
X4
a 4 = – 9 . 6 9 7 E- 0 2
0.099
–0.181
–0.983
0.334
X5
a5 = –25.781
7.754
–0.404
–3.325
0.002
X6
a6 = 0.103
0.093
0.147
1.108
0.277
β
a Dependent Variable: Y The importance can most easily be presented graphically, Fig. 3.1. Equation (3.15) can be tackled with the help of solving nonlinear equations. In this case, we are trying to solve a system of equations with the Levenberg–Marquardt method. It is necessary to determine the initial values a 1 , a 2 , … a k . Because this is an iterative procedure, the precision of iterations is necessary as well and should be 1× 10 –10 . First, the variance of the analysis helps us to ascertain if the regression model is statistically significant. The table below clarifies this because we see that the determination coefficient is very high. 25
Mineral Wool
The following table leads us to conclude that all parameters are of great importance. A special demand has been made that this time the coefficient be a 0 = 1. Caution is required when interpreting r 2 for the linear and non-linear model. In the linear model, the value of the correlation coefficient r 2 holds for the transformed value of the dependent variable log Y. Scatterplot Dependent Variable: LOGY Dependent variable: log Y
Regression standardized Regression standardized predicted value predicted value
3
2
1
0
-1
-2
-3 .6
.7
.8
.9
log Y
LOGY
Fig. 3.1. Comparison between the measured and the regression model (SPSS graph). Table 3.7. D e g re e s o f Fre e do m
S um o f S qua re s
M e an S qua re
F
S ig .
Re gre ssio n
6
1206.33906
201.05651
2 2 11 . 11
0.000
Re sid ua l
30
2.72794
0.09093
Unc o rre c te d To ta l
36
1209.06700
( C o r r e c te d To ta l)
35
21.80512
S o urc e
r2 = 1 - Re sid ua l S S /C o rre c te d S S = 0 . 8 7 4 8 9
26
Multiple Regression Table 3.8.
N o ns ta nda rd Co e ffic ie nts Va ria ble
ai
t
S ig .
Std. Erro r
9 5 % Co nfide nc e Inte rv a l fo r B Lo we r B o und
Uppe r B o und
a0 = 1 X1
a 1= – 0 . 3 5 8
0.158
–2.272
0.03039
–3.680
–0.036
X2
a 2= – 0 . 2 4 5
0.155
–1.585
0.12342
–3.561
0.071
X3
a 3= 2 . 1 8 9
1.050
2.085
0.04568
0.045
4.334
X4
a 4= – 0 . 2 5 5
0.073
–3.492
0.00151
–3.404
–0.106
X5
a 5= – 3 6 . 7 5 3
5.672
–6.480
3 . 7 E- 0 7
–48.337
–25.170
X6
a6 = 0.144
0.096
1.490
0.14662
–0.053
0.340
27