CHAPTER 3
TWO-LEVEL COMPLETE FACTORIAL DESIGNS: 2k
1.
INTRODUCTION
Two-level factorial designs are the simplest, but are widely used because they can be applied to many situation as either complete or fiactional designs. This chapter deals with complete designs. We will first examine a simplified example using only two factors. It will allow us to introduce several important basic concepts which will be used in later chapters. We will analyse a three-factor design and extrapolate the ideas acquired in this first example to an actual experimental design having five factors. Lastly, we will use the matrix approach to interpret two-factor complete factorial designs.
2.
COMPLETE THREE FACTOR DESIGN: 23
2.1.Example: The stability of a bitumen emulsion The Problem:
A manufacturer of bitumen emulsion wants to develop a new ak formulation. He has two bitumens, A and B. He wants to know the I
30
9
2
effects of a surfactant (fatty aad) and hydrochloric acid on the stability of the emulsion
As there are three factors, he decides to use a 23 design with the following factors and response Factors
..
= Factor 1 high and low fatty acid concentrations.
Factor 2 diluted and concentrated HCI. Factor 3 bitumen A and B.
Response Emulsion stability index, measured in stability points .The scientist knows that the experimental error of the response is plus or minus two stability points. He wishes to find the most stable emulsion: the one with the lowest stability index. Domain The two levels of each factor are indicated by +1 and -1 as reduced centred (or coded ) variables. The experimental domain is a cube (Figure 3 . 1 ) and the eight experimental points chosen are at the corners ofthe cube.
8
7
6
4
Figure 3.1: Distribution of experimental points within the experimental domain.of a Z3 design.
31
The experimental matrix (Table 3.1) is constructed in the same way as for the 22 design, but contains eight and not four experiments. To simpli@ table 3.1 we have used the signs + and - without the figure 1. The factors studied are not necessarily continuous variables, and two level factorial designs may include both continuous and non-continuous or discrete variables.
Trial no
Factor 1 (fatty acid)
Factor 2 (HCl)
Factor 3 (Bitumen)
1
-
-
-
+
2 3 4 5
-
-
+
+ +
-
-
-
-
8
+
+ +
+ + + +
Level (-)
low conc.
diluted
A
Level (+)
high conc.
concentrated
B
+
6 7
-
-
Response
~~
The effects of each factor and the interaction values are calculated fi-om the effects matrix (Table 3.2) as they were for the 23 design, i.e. by taking the experimental matrix signs for the main factors and using the sign rule for the interactions. The effects and interactions are obtained by a three-step calculation: The response is multiplied by the corresponding sign in the factor (or interaction) column, The products obtained are added, The sum so obtained is divided by a coefficient equal to the number of experiments For example, the effect of factor 3 is obtained fiom the formula: 0
1
E, = -[-388
37-26-24
+30+ 28+ 19+ 161
= -4
similarly, the third order interaction, 123, is obtained from: 1 E 123= -[-3 8
8+37+26-24+30-28-19+
161 = 0
32
r' +
6 7 8
+ + +
Effects 27.25
~1
TABLE3.2 EFFECTS MATRIX STABILITY OF A BITUMEN EMULSION
Inter. 23
Response
+ + + +
-
+
-
+
+ +
-1
-6
+ + -4
-0.25
-0.25
0.25
0
The experimenter then analyses the results by drawing up a table of effects indicating, whenever possible, the experimental error estimated by the standard deviation (Table 3.3).
TABLE3.3 TABLE OF EFFECTS STABILITY OF A BITUMEN EMULSION
Mean
27.25
k 0.7 points k 0.7 points +_ 0.7 points 0.7 points
1
-1.00
2 3
-6.00 -4.00
12 13 23
-0.25
123
*
0.25
k 0.7points k 0.7 points k 0.7 points
0.00
k 0.7 points
-0.25
33
We can now begin to interpret these results. All the interactions are smaller than the standard deviation. These can therefore be considered to be zero and neglected. Factors 2 and 3 are much greater than the standard deviation, and thus have an influence, while factor 1 is just a little larger than one standard deviation and much smaller than two standard deviations. It is thus unlikely to have any influence STABILITY
A
33.25 27.25 21.25
-1 DILUTE
o
+
CONCENTRATED
HCI CONCENTRATION
Figure 3.2: Effect of hydrochloric acid (factor 2) on bitumen emulsion stability.
STABILITY
31.25
0
27.25 23.25
-1
+l
A
B
BITUMEN
Figure 3.3: Effect of bitumen type (factor 3) on bitumen emulsion stability.
34
Therefore, the concentration of fatty acid (factor 1 ) probably has no influence on emulsion stability over the range of concentrations tested. The plane passing through the centre of the experimental domain and parallel to factor 2 shows the effect of hydrochloric acid. The plane passing through the centre of the experimental domain and parallel to factor 3 reveals the effect of bitumen. We can now state the results of the experiment: Results:
*
The fatty acid concentration has little or no influence on the emulsion stability. The hydrochloric acid concentration has a large effect The ; I type of bitumen used is also important, the best stability (lowest Q i response) will be obtained with type B and dilute HCI There IS no xx s significant interaction. B
g
Note:
A negative effect is not necessarily an undesirable one. An effect is negative when the response falls as the factor increases from -1 to + I . Conversely, a positive effect occurs when the response increases as the corresponding factor goes from -I to + I
3. THE BOX NOTATION We could also use the Box notation [8] to indicate the effects and interactions. With this notation El is represented by a bold figure 1 (I), and E, = 2, E, = 3, etc. The mean is represented by the letter I The general formulae for the effects and interactions of a 2 3 design are’ Mean
=
1 1 = -[+Y, + Y , + y 7 +Y, + Y ~+ y 6 + y 7 +y81 8
35
4.
RECONSTRUCTINGTWO 22 DESIGNS FROM A 23 DESIGN
Examining the results in a little more detail, we see that, as factor 1 is without influence, the experimental domain is reduced to a design in which only factors 2 and 3 have any influence. This also indicates that the response does not depend on the level of factor 1, but only on the levels of factors 2 and 3 . The responses can therefore be rearranged in pairs ignoring the factor 1 level, as shown in the following table (Table 3.4). TABLE3.4 EXPERIMENTAL MATRIX REARRANGED STABILITY OF A BITUMEN EMULSION.
Trial no
7
8
Response
+
37 24 28 16
38 26 30 19
+
Average 37.5 25.0 29.0 17.5
These results can also be displayed graphically, as in Figure 3.4 29 CONCENTRATED +1
[fx
16 19]i7.5
HCI
A -1
Bitumen
B
+1
Figure 3.4: The bitumen emulsion is most stable when the hydrochloric acid is dilute and bitumen B is employed.
36
5. THE RELATIONSHIP BETWEEN MATRIX AND GRAPHICAL
REPRESENTATIONS OF EXPERIMENTAL DESIGN
This relationship is easy to understand for a 22 experimental design. An experimental point A can be defined: 1 . by its coordinates in a Cartesian two dimensional space: a on the Ox, axis (horizontal) and b on the Ox, axis (vertical) as show in Figure 3.5. This is the graphical representation. The coordinates of a and b can be expressed in centred reduced (or coded) units or in classical units.
"T
/A
Figure 3.5: Geometric representation of experimental points 2. by the level of the two factors studied, trial A is defined by level a of factor x, and level b of factor x2. The coordinates of experimental points are the levels indicated in the experimental matrix X2
P
b
-
-
TRIAL NAME
a'
P
7b'
P'
-
Figure 3.6: The matrix diagram of experimental points is equivalent to the geometric representation
37
A set of experiments is defined by several points with geometrical representation and by several trials with matrix representation. Figure 3.6 illustrated these two ways of representating two experimental points and the two corresponding trials. While it is also possible to produce a graphical representation of a three factor experiment in a three dimension space it is clearly impossible to do so for four and more factors. It is therefore necessary to find a way of representing experimental points in these hyper-spaces which is both convenient and applicable to any number of dimensions. The most common solution is to use matrix representation, which works for any numbers of factors. Table 3.5 shows four trials defined by the level of seven factors. TABLE3.5
The geometrical counterpart of Table 3.5 is a set of four points defined by their seven coordinates. Hence the experimental matrix gives the location of experimental points in the experimental space, Anyone producing experimental designs must learn to think in ndimensional space without graphical representation. It is easy to pass from geometrical to matrix representation for two or three factors and experimenters must become accustomed to switching from n factor matrices to n dimensional space and vice versa.
6.
CONSTRUCTION OF COMPLETE FACTORIAL DESIGNS
All factorial designs are constructed in the same way as those shown in Tables 2.2, 2.4 and 3.7. The sequence of the signs for factor 1 is: -
+
-
+
-
+
-
+
,etc.
They alternate, commencing with a negative (-), The sequence of the signs for factor 2 is a series oftwo -, followed by two +: _ _
+ +
- -
+ + ,etc.
38
The sequence for factor 3 is four negatives (-), followed by four positives (+). Any hrther factors have 8, 16, 32, - signs followed by 8, 16, 32 + signs. There is always the same number of + and - signs in the column for each factor.
7.
LABELLING OF TRIALS IN COMPLETE FACTORIAL DESIGNS
When the + and - signs for each factor are laid out as shown above, the trials are numbered sequentially using whole numbers.(see Tables 2.2, 2.4 and 2.7). This is Standard numbering. As we will see later, the order of the trials can be changed, for randomisation, drift or blocking designs. But the number of each trial will be retained, regardless of its position in the layout. For example, trial number 23 of a complete 25 design (Table 3.7) always has the sequence of levels taken by factors 1 , 2 , 3 , 4 and 5:
-++-+ There are other ways of labelling trials, but we shall not discuss them here
8.
COMPLETE FIVE FACTOR DESIGNS: 25
8.1. Example:
Penicillium chrysogenum growth medium
The Problem: This design was used in a study to increase the yield of a penicillin production plant It was reported by Owen L Davies [Illin his book "The design and analysis of industrial experiments" Penicillium chrysogenum is grown in a complex medium, and the experimenter wanted to know the influence of five factors
8 *3.
i
y.
f %-
F
-f
5
1 concentration of corn liquor 2 concentration of lactose
3 concentration of precursor 4 concentration of sodium nitrate 5 concentration of glucose
The response was the yield of penicillin, as weight (the units were not given in the original text). The experimental matrix of the 22 design summarizes the experimental data and the results of each of 32 trials.
39
TABLE 3.6
EXPERIMENTAL MATRIX PENlClLLlUM CHRYSOGENUM GROWTH MEDIUM
Trial no 1 2 3 4
Factor 2 (lactose)
Factor 3 :precursor)
-
-
-
-
-
-
-
-
+
+
+ +
+
-
-
+
+ +
+ + + +
+
-
-
-
-
5
6
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 LevelLevel +
Factor 1 (corn liq.)
-
-
-
+
+ +
+
-
-
+
+ +
+
-
-
+
-
-
+ + + +
+ +
+
-
-
+
I
I
2% 3%
1
2% 3%
+ + + + + + + + + + + + + + + +
-
+ + + +
+ +
-
-
+ + + +
-
-
-
-
-
-
-
-
+ +
-
-
+ + + +
+
+
-
-
-
-
Factor 5 (glucose)
-
-
-
Factor 4 (sod.nit.)
1 I
0 0.05%
[
0 0.3%
I I
0
0.5%
Response 142 114 129 109 185 162 200 172 148 108 146 95 200 164 215 118 106 106 88 98 113 88 166 79 101 114 140 72 130 83 145 110
40
The effects were calculated by the standard procedure and the results are shown in the table of effects (Table 3.7). TABLE 3.7
TABLE OF EFFECTS PENlClLLlUM CHRYSOGENUMGROWTH MEDIUM
Mean
129 6
1 2 3 4 5
-17 6 06 16 1 10 20 9
12 13 14 15 23 24 25 34 35 45
-5 9 -6 1 -5 0
26 44 -1 0 30 -1 0 -10 5 22
123 124 125 134 135 145 234 235 245 345
-I 3
1234 1235 1245 1345 2345
40 26 18 42 11
12345
63
-2 9 -1 6 17 -3 2 28 -2 6 27 23 06
41
Analysis of the effects of the factors showed that two factors have no influence: Factor 2, the concentration of lactose. Factor 4, the concentration of sodium nitrate And that the effects of three factors are significant:
0
Factor I, the concentration of corn liquor Factor 3, the precursor concentration. Factor 5, the glucose concentration.
A second order interaction appears to be significant: 0
0
Interaction 35, between precursor and glucose. interaction 12345 seemed to be abnormally large. We will leave this for the time being, but come back to it later.
TABLE3.8 EXPERIMENTAL MATRIX REARRANGED PENlClLLlUM CHRYSOGENUM GROWTH MEDIUM
Trial no 1 2 5 6 17 18 21 22
3 4 7 8 19 20 23 24
9 1 1 10 12 13 15 14 16 25 27 26 28 29 31 30 32
Factor Factor 1 3 -
+
-
-
+
+ +
-
-
+ -
-
+ +
+ -
Results
Average -
142 129 148 114 109 108 185 200 200 162 172 164 106 88 101 106 98 114 113 166 130 88 79 83
146 95 215 118 140 72 145 110
141.25 106.50 200.00 154.00 108.75 97.50 138.50 90.00
If we look at the three factors which do influence the growth of Penicillium chrysogenum, we see that there are 32 trials, but we know that only 8 trials are required to study three factors. We can therefore group together the trials having the same levels for factors 1 , 3, and 5, regardless of the levels of 2 and 4. For example, trials 1, 3, 9, and 1 1 were carried out at the low level of factors 1, 3 and 5, so that the results of four trials should be the same, allowing for experimental error. The 32 trials are used as if four 23 designs had been performed. Table 3.9 shows the rearrangement of trials and the mean responses for each group. Thus, it appears as if a three factor design was repeated four times.
42
TABLE3.9
TABLE OF EFFECTS PENICILLIUM CHRYSOGENUM GROWTH MEDIUM
Mean
129.6i6
1
-17.6i-6 16.1 k 6 -20.9+6
3 5
-6.1 + 6 2.6*6 35 - 1 0 . 5 f 6 13 15
135
-3.2+6
The experimental domain is reduced to a cube for the three influencing factors. We can therefore introduce the mean of each response at each corner of the cube to facilitate interpretation (Figure 3.7)
138
/+ 108
90
97
GLUCOSE
200
(5) 9 . 0 (
Figure 3.7: Diagram showing the results of the trials on Penicillium chrysogenum medium.
43
A high percentage of corn liquor (factor 1) evidently reduced the yield of penicillin. At a low level of factor 1, the yield was clearly improved by the addition of precursor and the absence of glucose. The presence of glucose reduced the effectiveness of the precursor. Results:
I
Under the experimental conditions used, the best yield of penicillin is obtained
b
0
Y
with a low (2%) concentration of corn liquor
I 0 with precursor.
Q
0
without glucose, which reduces the yield and inhibits the precursor
138
108
141
200
Precursor 0%
0.05 %
Figure 3.8: Influence of precursor and glucose at a corn liquor concentration of 2%. The interaction 12345 appears to be too great; and we will look at the reason for this in the chapter on blocking (Chapter 10).
9.
COMPLETE DESIGNS WITH k FACTORS: 2k
We have seen that 22 , 23 and 25 designs can be used to study two, three or five factors. A 2k design can be used when there are more factors, with k having any desired size. The experimental matrix and the effects matrix are constructed according to the same rules as were used previously. The calculation of the k major effects and the 2k-k-1 interactions are similarly performed. There is thus no theoretical limit to the number k of
44
factors that may be studied. But in practice the number of trials needed quickly becomes very large. A total of 27 (128) trials are required to study only 7 factors. This is a considerable number, and is rarely compatible with the facilities generally available in industry or university. This brings us to a most troublesome problem. We must find a way of reducing the number of trials without reducing the number of factors studied. We will examine this problem in Chapter 6.
10. THE EFFECTS MATRIX AND MATHEMATICAL MATRIX The effects can be calculated from the experimental results using an effects matrix, which, for a 22 experiment, looks like (Table 3.10).
Trial no
Mean
Factor 1
Factor 2
1 2 3
+I
+I
-1 +I
-1 -1
+1 +1
+1
4
+I +I
-1
7 Interaction
This array of numbers can be used for a calculation; it is thus a mathematical tool, a Matrix. It can be written: +I
-1
-1
+1
+I +I +I
+1 -1 -1 +1
-1 -1
+I
+1
+1
A mathematical matrix is simply a table containing elements (here they are numbers) arranged in rows and columns. When the number of rows equals the number of columns the matrix is said to be square - otherwise it is rectangular. A matrix may contain just a single row and several columns (a linear matrix) or a single column and several rows ( a column matrix, or vector matrix). We will use matrices to express experimental results. Theyi may be shown in a rather special table because its contains only one column. They response vector matrix is:
Y=
Y2 Y3 Y4
An analogous matrix can be written for the effects:
45
E=
Before we use these matrices we will examine the operations which can be performed on a single matrix, or between matrices themselves. The operations we need are transposition for a single matrix, and matrix multiplication for two or more matrices.
+1
+I
+I
+1
x t = -1
+1 -1
-1 +1
+1
+1 -1
-1
+1
-1
+l
The second operation we will need is the multiplication oftwo (or more) matrices 10.2. Matrix multiplication Any reader not familiar with matrix calculations should read Appendix 1 before continuing with this Chapter. If we multiply matrix Xt by Y we get: +I
-1 Xt Y = -1
+I
+I +1 +I -1 -1 +1
+1
-1
+1
-1
+1
+1
y1 y1
y2 y4
The first element of the matrix-product is: [+Yl
+Yz
+Y3
+Y'll
or four times the mean of the responses. Similarly the second element is: [-Y1 +Y2
-Y3
+Y,I
46
or four times El, the effect of factor 1 . The calculations for the results of the third and fourth elements of the matrix-product are similar. We can therefore write: +I -1
+1
+l
+1
+I
-1
+1
-1
-1 -1
+I
+1
-1
+I
+I
which can be condensed to or
y, y 2 =4
I El
~3
E,
y,
El,
X'Y = 4E E = -1X t Y 4
This relationship for a 22 design can be extended to all two level complete factorial designs. When n is the number of trials we have 1 n
E = -X'Y We now have, in the form of a matrix, the technique we used to calculate the effects and interaction of 2k designs. The matrix form clearly shows that the experimental responses y j have been transformed by the matrix Xt so as to be more readily interpreted. A factor increases (or reduces) the mean of responses I by a quantity equal to its effect. In the first example we examined, the yield of a chemical reaction, the four responses 6O%, 70% 80% and 90% were difficult to interpret. But when they are transformed by the matrix Xt, the effect of each factor is obtained as if it were alone. A 10°C rise in temperature increases the yield from 75% to 8O%, while a pressure rise of 0.5 bar increases the yield fiom 75% to 85%. When the responses of a 2k are examined it is impossible to distinguish the influence of each factor. But the transformation by the Xt matrix displayed the useful information in the set of responses more clearly, revealing the effect of each factor as if it were alone. As the matrix X is the mathematical translation of the location of the experimental points, it is clearly most important that these points should be optimally placed in the experimental domain. Poorly positioned experimental points obscure the information instead of highlighting it. Well positioned experimental points clarifl the information (Chapter 16). The analysis of the specific X matrices which are use in all two level factorial designs can be developed a little. These are the Hadamard matrices, and they have quite remarkable properties. Let us first calculate the reciprocal of the X matrix and then examine the product of X and its transpose, XtX. 10.3. Inverse of X
The calculation of the inverse of a matrix is complicated for the general case, requiring a computer for high order matrices. But the calculation for X matrices of factorial designs (Hadamard matrices) is greatly simplified because of the following relationship:
47
The inverse of X can be obtained by transposing X and dividing all the elements of the Xt matrix by n, the number of trials. The relationship X'Y =nE
then becomes
X-'Y = E
or
Y=XE This formula can be used to calculate the responses from the effects
+1
+1 +I
-1 +1 --1 -1 +1 -1
+1
-1 +1 +1 +1
+1 -1 -1 +1 +1 -1 +1 - 1 + I
+1 -1 -1
-1
+I
+1
+1
+1 +1
4 -
0
0 4
0
0
1 0 0 0
0 0 0 1 0 0 = 4 4 0 0 0 1 0
0
0
0
0 0 4
0 0 0 1
which can be condensed to XtX =41 For this design, the product of the matrix of effects by its matrix transpose is 4 times the unit matrix. The general form of the formula for all two level complete factorial designs is X'X =nI where n is the number of trials. The matrix XtX is equal to n times the matrix unit in the case of two level factorial designs. It can be demonstrated that, in this case, the precision obtained for the effects is the best than might be hoped for (see Chapter 5). Experiments in which a two level complete factorial design is used are certain to provide calculated effects with maximum precision. 10.5. Measurement units
The responses yi were measured with a unit, metre, centimetre, volt, etc., or a less usual unit, such as an index, percentage or variance. The matrix Xt does not change the unit in which yi is measured, it simply transforms the trials results into a system that is easier to interpret. As a result, the mean, the main effects and the interactions are evaluated in the unit used to measure the responses.
48
RECAPITULATION. We have used the example of bitumen emulsion stability to: Extend the concepts acquired with 22 design to a z3 design. Extend the concept of interaction. Examine the rules for calculating effects and interactions from an effects matrix. Introduce Box notation. Present the results as a table of effects. Show that both continuous variables (temperature and pressure) and discrete variables (type of bitumen) can be studied simultaneously within the same experimental design.
Using a 25 design allowed us to: Apply the principles acquired to a real case. Use the fact that some factors were without influence to construct a replicate factorial design. The mathematical matrix representation of two-level factorial designs was used to: Calculate effects, interactions and mean from the responses. Introduce transposed matrices, product matrices and vector matrices. To simplify the interpretation of results which are transformed into mean, effects and interactions. Guarantee that the effects and interactions calculated have the highest possible precision. Define the units for measuring effects and interactions.