Biochemical Engineering Journal 21 (2004) 199–205
Two-stage aqueous two-phase extractions: selection of system composition using a genetic algorithm J. Bleiera , M. Syddallb , A. Lyddiattb , X.D. Chena,∗ a b
Department of Chemical and Materials Engineering, The University of Auckland, Private Bag 92019, Auckland, New Zealand Biochemical Recovery Group, Centre for Bioprocess Engineering, School of Chemical Engineering, University of Birmingham, Edgbaston, Birmingham B15 TT, UK Received 18 February 2002; accepted 24 May 2004
Abstract A genetic algorithm (GA) was applied to search for optimal protein recovery from two two-stage aqueous two-phase (A2P) extractions. The first contained only BSA and the second contained citrated whole bovine blood. This work shows that GAs can be applied to the problem of obtaining an effective combination of forward and back extractions for the recovery of proteins from a biological material. The GA performed the task in a short time and required little prior knowledge of the behaviour of the system. © 2004 Elsevier B.V. All rights reserved. Keywords: Aqueous two-phase systems; Back extraction; Genetic algorithm
1. Introduction 1.1. Aqueous two-phase partitioning An aqueous two-phase (A2P) extraction process for the recovery of proteins from biological material typically involves a forward extraction, performed by the combination of phaseforming components and biological material. Conditions are created in which the molecule of interest partitions preferentially to the top, polymer-rich phase and unwanted material to the bottom phase. The top phase is removed and a back extraction, to separate the polymer from the protein, can be performed by another addition of a phase-forming material. This time, conditions are created so that the bottom phase contains the protein, allowing the potential for recycling of the polymer [1]. Aqueous two-phase extractions have found application in a small number of industrial-scale purification processes where the object is to separate proteins from biomass [2]. Very ∗ Corresponding author. Tel.: +64 9 373 7599x7004; fax: +64 9 373 7463. E-mail address:
[email protected] (X.D. Chen).
1369-703X/$ – see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.bej.2004.05.011
few of these employ a back extraction to recover the polymer. This step considerably reduces the cost of the process so why has this method not been employed more often? Indeed, why are there so few documented instances of production-scale implementation of this technology even without the use of a back extraction? An answer to these questions is that designing an A2P separation process requires consideration of a lot of variables. These choices are made difficult as there is no clear predictive model to describe phase separation and partitioning behaviour [3]. Addition of a back extraction compounds the problem by orders of magnitude as it introduces many more possible combinations of conditions and also because the environment experienced by a protein in a forward extraction can influence the way it behaves in a back extraction. Without rules of thumb, heuristcal searches for optimum performance could take several life times. Even if these rules are employed, it can still take considerable time to optimise an extraction process. Due to the difficulties in predicting the behaviour of molecules within A2P systems, a number of groups have investigated experimental methods of optimisation [4,5]. Re-
200
J. Bleier et al. / Biochemical Engineering Journal 21 (2004) 199–205
cently, Selber et al. [6] tested three mathematical strategies to optimise extraction of a recombinant protein from a bacterial homogenate. They concluded that the genetic algorithm was an optimisation procedure that could be used well with A2P systems. 1.2. Genetic algorithms Genetic algorithms (GAs) are robust, probabilistic search techniques, based on mechanisms observed in biological evolutionary processes. Their basic concepts were first set forth in a 1975 monograph of Holland, which has recently been republished [7], and an extensive description of both the theory underlying their operation and examples of their application is given in a book, published in 1989, by Goldberg. GAs evolve populations of chromosomes comprised of encoded parameters, usually in binary, to search for an optimal region. The three genetic operators: selection, recombination and mutation define the manner in which GAs work. • The fitness of the individuals is evaluated. Selection refers to the assignment of the probability of mating relative to the individual’s fitness. • Recombination is the crossing-over of two individuals in the probabilistically determined mating pool. Both strings are “cut” at a random location along the string and the fragments exchanged to produce two new individuals. • Mutation causes the random change of bits in the chromosome and is applied probabilistically. Selection drives the search toward the optimum while recombination broadens the scope of the directed search. Mutation preserves genetic variation, decreasing the chance of the algorithm terminating at a local, non-global optimum. Since GAs only make use of the fitness of coded strings, no mathematical implications of continuity or smoothness are implied. In the texts on GAs [7,8], the strings are often regarded as representing “black box” functions, returning values for which the relationship with the input values is poorly characterised or completely unknown. 1.3. Convergence properties and genetic algorithm extensions for small populations Much of the history of GAs has focused on computerbased simulations, carried out for many generations using large populations. It has been shown that, in probability, GAs that retain the best individuals found to date within the latest search population always converge to the optimum [9]. Unfortunately, for experimental purposes, such a guarantee of convergence also requires that an inordinately large number of experiments would need to be performed. A number of modifications have been proposed in relation to the use of small search populations [10], such as the use of complementary individuals. This is where an individual from the population is replaced with the bitwise complement of another member of the population, selected at random.
It has been suggested as a mechanism for maintaining the diversity of alleles within a population, and thereby reducing the risk of premature convergence to a non-global optimum [11]. GAs have been used in conjunction with practical experimentation prior to their application to A2P systems. An example of this is the search for optimal conditions for fermentations [12].
2. Methods and materials All reagents were obtained from Sigma except citrated whole bovine blood which was supplied by Life Technologies NZ Ltd. 2.1. Determination of protein recovery Protein concentration was determined using a Pierce MicroBCA Assay kit according to the manufacturer’s instructions. All samples were diluted by at least 100-fold prior to being assayed. The packed cell volume was determined using a haematocrit. Recovery of protein in the top phase of the forward extraction of systems that contained pure BSA was calculated using Eq. (1). Rt =
Ct Vt × 100 Ma
(1)
where Rt is the percentage recovery, Ct the concentration of protein in the top phase, Vt the volume of the top phase and Ma is the amount of BSA added to the system initially. Recovery of protein in the bottom phase of the back extraction from systems loaded with pure BSA was calculated using Eq. (2). Rb =
Cb Vb (Vtf /Vta ) × 100 Ma
(2)
where Rb is the percentage recovery, Cb the concentration of protein in the bottom phase, Vb the volume of the bottom phase, Vtf the volume of the top phase of the forward extraction and Vta the volume of the top phase of the forward extraction that was used to make the back extraction. For systems that contained whole blood, protein recovery in the top phase of the forward extraction was calculated using Eq. (3). Rt =
A562t Vt × 100 (Mbl /SGbl )(1 − PCV)A562p
(3)
where Rt is the percentage recovery, A562t the absorbance at 562 nm obtained from the micro BCA assay of a diluted sample of the top phase (multiplied by the dilution factor), Vt the volume of the top phase, Mbl the mass of blood added to the system, SGbl is the specific gravity of bovine blood [13], PCV is the packed cell volume of the blood and A562p is the absorbance at 562 nm obtained from the micro BCA assay of a diluted sample of bovine plasma (multiplied by the dilution
J. Bleier et al. / Biochemical Engineering Journal 21 (2004) 199–205 Table 1 Genes and alleles for BSA partitioning experiment
201
Table 2 Genes and alleles for the extraction of plasma proteins from whole blood
Gene
Alleles
Gene
Alleles
1. PEG Mr
300, 400, 600, 1000, 1450 and 3350 5–20 5–20 6, 6.5, 7, 7.5 and 8 0–6 5–50
1. PEG Mr
300, 400, 600, 1000, 1450 and 3350 5–20 5–20 6, 6.5, 7, 7.5 and 8 5–50
2. PEG (%, w/w) 3. Phosphate (%, w/w) 4. pH of the forward extraction 5. NaCl (%) 6. Top phase from the forward extraction (%, w/w) 7. Phosphate (%, w/w) 8. pH 9. NaCl (%)
5–20 6, 6.5, 7, 7.5 and 8 0–6
Genes 1–5 refer to the forward extraction. Genes 6–10 refer to the back extraction. Gene 6 refers to the amount of the top phase from the forward extraction in the back extraction.
factor). Plasma was prepared from the same blood used to load the system. For systems containing whole blood, the protein recovery in the bottom phase of the back extraction was calculated using Eq. (4). A562b Vb (Vtf /Vtg ) Rb = × 100 (Mbl /SGbl )(1 − PCV)A562p
2. PEG (%, w/w) 3. Phosphate (%, w/w) 4. pH 5. Top phase from the forward extraction (%, w/w) 6. Phosphate (%, w/w) 7. pH 8. Whole bovine blood (%, w/w)
5–20 6, 6.5, 7, 7.5 and 8 0–30
Genes 1–4 and 8 refer to the forward extraction. Genes 6–7 refer to the back extraction. Gene 6 refers to the amount of the top phase of the forward extraction in the second extraction.
top phase of the forward extraction, Vtg is the volume of the bottom phase of the forward extraction used to make the back extraction. For all of the two-stage extractions, overall recovery was calculated using Eq. (5). Rtot = Rt Rb × 100
(5)
(4)
where Rb is the percentage recovery, A562b the absorbance at 562 nm obtained from the micro BCA assay of a diluted sample of the bottom phase (multiplied by the dilution factor), Vb the volume of the bottom phase, Vtf the volume of the
3. The genetic algorithm A Genetic Algorithm Tool Box for MATLAB [14] was used.
Fig. 1. The model function used to test coding alphabets, mutational operators and varying population sizes with the genetic algorithm.
202
J. Bleier et al. / Biochemical Engineering Journal 21 (2004) 199–205
3.1. The structure of the chromosomes Table 1 shows the genes and alleles for the chromosomes that described targeted partitioning of BSA to the bottom phase of a back extraction. These systems were loaded with 0.1% (w/w) BSA. Table 2 shows the genes and alleles for the chromosomes that described the extraction of plasma proteins from citrated whole bovine blood. Gene 6 in Table 1 and Gene 5 in Table 2 refer to the percentage of the back extraction that was made up of the top phase from the forward extraction. In the actual experiment, this gene controlled the size of the back extraction as well. For example, if the algorithm required a phenotype of 5% for this value, and 1 g of top phase was recovered from the forward extraction, the total mass of the back extraction was 20 g. This approach also enabled the total mass of the back extraction to be controlled and kept to a practical working size by taking a smaller aliquot. To maintain comparability and to determine the maximum capability of the system recovery was scaled in Eqs. (2) and (4) to allow for the rest of the top phase. 3.2. Coding alphabet, population size, and mutational operators To assist with testing various alphabets, population sizes and mutational operators for the GA, a model function was constructed to attempt to simulate the effect of each allele using the chromosome structure in Table 1 upon the recovery value (Fig. 1). The algorithm was made incestuous by the retention of the two best individuals from the previous generation and an additional mutational operation, bitwise complementation as described by Reeves, was employed. Gray coding was used as the alphabet for coding the chromosome. These choices were made based on trials with the model function. They provided the best chance of convergence to the optimum values in the model function. This was located 80% of the time, within 25 generations in 100 trials.
4. Experimental procedure A flow-chart depicting the order of events is shown in Fig. 2. Initially, a random population of 20 chromosomes was generated by the GA software [14] and were decoded into phenotypes. These were constructed in the laboratory and assayed for protein recovery. The recovery values were used as the objective function and were entered into the GA software and processed to enable the algorithm to calculate the next generation of chromosomes, which were decoded and the process repeated until the termination criteria had been met. Termination of the algorithm was permitted when 95% recovery was achieved or 25 generations had been performed.
Fig. 2. Flow-chart showing the cycle of the GA and the experiment. Rectangular boxes are GA steps (performed by the software) and oval boxes are laboratory steps (performed by the experimenter).
The following conditions were allocated an objective function value of zero: • systems where either extraction was monophasic; • systems where precipitation was present in the top phase; • systems with forward extractions that had cells in the top phase. Clearly, monophasic systems would not have served any useful purpose but the reason for selecting against conditions that had solid material, either precipitate or blood cells, in the top phase of either extraction was that this would complicate and possibly even prevent the use of a liquid–liquid separator in a scale-up of the process.
J. Bleier et al. / Biochemical Engineering Journal 21 (2004) 199–205
203
Fig. 3. Distribution of the recovery values obtained in each generation of the genetic algorithm that searched for optimal recovery of BSA in the bottom phase of the back extraction of a two-stage aqueous two phase extraction.
5. Results and discussion The enormity of the task in locating a suitable set of conditions should not be underestimated. The genes and alleles in Table 1 coded for 1,384,857,600 phenotype combinations. An exhaustive search of this set, making 100 systems per day and working 7 days per week, would have taken 38,045.5 years. The number of possibilities could have, of course, been reduced dramatically by using “rules of thumb” concerning A2P systems [15]. A possible incorporation of these principles would have been to create a more complicated algorithm that had rules which made use of current knowledge about the behaviour of A2P systems in its procedure for generating new systems. A simpler implementation might have been through a narrowing of the allele range for many of the genes, for example forward extractions might have been confined to the left-hand side of the phase diagram and back extractions to the right-hand side. These were not included in this work as the intention was to demonstrate that a useful system could be located without a great deal of prior knowledge about A2P systems. Commercial implementation often requires rapid development. If a technology can swiftly produce a preliminary result, it has a much greater chance of obtaining support within a company environment. Often, there simply is not time to become completely conversant in all the literature and methods will be overlooked in favour of something that is more easily implemented. The GA located a two-stage extraction system that had the potential to recover greater than 90% of the BSA added to it. This was achieved in the third generation of the algorithm. As can be seen in Fig. 3, the recovery values from subsequent generations converged towards the desired value. Fig. 4 shows the distribution of alleles for each gene in the final generation of the algorithm. In the cases of 4B, 4C and 4G, the alleles were grouped into bins of 2% and 4F was grouped into bins of 5%. These plots show clustering in most
alleles and these are a good indication that a maxima has been found. In Fig. 4, Graph A shows the number of individuals having each PEG molecular mass. Graph B shows the distribution
Fig. 4. Distribution of each allele in the final generation of a genetic algorithm that had searched for optimal recovery of BSA in the bottom phase of the back extraction of a two-stage A2P extraction.
204
J. Bleier et al. / Biochemical Engineering Journal 21 (2004) 199–205
Fig. 5. Distribution of the recovery values obtained in each generation of the genetic algorithm that searched for optimal recovery of plasma protein in the bottom phase of the back extraction of a two-stage aqueous two phase extraction.
of PEG compositions of the forward extractions grouped into bins of 2%. Graph C shows the distribution of phosphate compositions of the forward extractions grouped into bins of 2%. Graph D shows the distribution of pH values of the forward extractions. Graph E shows the distribution of NaCl compositions in the forward extractions. Graph F shows the amounts of the back extractions, grouped into bins of 5% that were comprised of aliquots of the top phase of the forward extractions. Graph G shows the compositions of phosphate in the back extractions, grouped into bins of 2% that were added to make the systems biphasic. Graph G shows the distribution of pH values of the phosphate solution added to the back extractions. Graph I shows the distribution of NaCl compositions of NaCl added to the back extractions. Working with systems that contained whole blood proved a greater challenge for the GA. This was because it was unable to find a system that recovered any protein until generation 9. After that, as can be seen in Fig. 5, it proceeded quite satisfactorily to obtain a system that recovered around 50% of the plasma protein from whole blood in 23 generations. Fig. 6 shows the distribution of alleles for each gene in the final generation of the algorithm. In the cases of 6B, 6C, 6E and 6G, the alleles were grouped into bins of 2% and 6F was grouped into bins of 5%. These plots show clustering in most alleles and they are a good indication that a maxima has been found. In Fig. 6, Graph A shows the number of individuals having each PEG molecular mass. Graph B shows the distribution of PEG compositions of the forward extractions. Graph C shows the distribution of phosphate compositions of the forward extractions. Graph D shows the distribution of pH values of the forward extraction. Graph E shows the distribution of whole citrated bovine blood compositions in the forward extraction. Graph F shows the amounts of the back extractions that were comprised of an aliquot of the top phase of the forward extractions. Graph G shows the composition of phosphate in the
back extraction that was added to make the system biphasic. Graph H shows the distribution of pH values of the phosphate solutions added to the back extractions. The current study employed the ‘protein recovery’ after the ‘two-stage extraction’ as the parameter targeted by the GA procedure. Perhaps, more fundamental ‘intermediate’ parameters like the partition coefficients should be targeted. This may make the GA approach more general which may then be applied to a wider range of applications.
Fig. 6. Distribution of each allele in the final generation of a genetic algorithm that had searched for optimal recovery of plasma protein from whole bovine blood in a two-stage A2P extraction.
J. Bleier et al. / Biochemical Engineering Journal 21 (2004) 199–205
6. Conclusion It is rather difficult to assess the absolute success or failure of a GA search for a global optimum unless the entire search space has been characterised but if that was the case, searching in this manner would be unnecessary. This work shows that GAs can be applied to the problem of obtaining an effective combination of forward and back extractions for the recovery of proteins from a biological material. It also allowed the inclusion of criteria that would enable the systems it located to be used for large-scale protein extractions that could be processed with liquid–liquid separators. The GA performed the task in a short time and required little prior knowledge of the behaviour of the system.
[4]
[5]
[6]
[7]
[8]
[9]
Acknowledgements The authors wish to acknowledge the financial support of FRST via a Graduate Research in Industry Fellowship and the industry partner for the fellowship, Life Technologies (NZ) Ltd. We are also grateful for the advice from Dr. Neil Pattinson.
[10]
[11] [12]
[13]
References [1] P.-A. Albertsson, Partitioning of Cell Particles and Macromolecules, third ed., Wiley, New York, 1986. [2] H. Walter, D. Brooks, D. Fisher, Partitioning in Aqueous Two-Phase Systems: Theory, Methods, Uses and Applications to Biotechnology, Academic Press, Orlando, 1985. [3] H.O. Johansson, G. Karlstrom, F. Tjerneld, C.A. Haynes, Driving forces for phase separation and partitioning in aqueous two-phase
[14]
[15]
205
systems, J. Chromatogr. B, Biomed. Sci. Appl. 711 (1–2) (1998) 3–17. G.M. Zijlstra, D. de, G. Cornelis, J. Tramper, Extractive bioconversions in aqueous two-phase systems, Curr. Opin. Biotechnol. 9 (1998) 171–176. R.A. Hart, J.R. Ogez, S.E. Builder, Use of multifactorial analysis to develop aqueous two-phase systems for isolation of non-native igf-i, Bioseparation 5 (1995) 113–121. K. Selber, F. Nellen, B. Steffen, T. OmmesandKul, Investigation of mathematical methods for e cient optimisation of aqueous twophase extraction, J. Chromatogr. B, Biomed. Sci. Appl. 743 (743) (2000) 21–30. J.H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, MIT Press, 1992 (first ed. 1975 edition). D.E. Goldberg, Genetic Algorithms in Search Optimization and Machine Learning, Addison-Wesley Publishing Company, Reading, MA, 1989 (Reprinted with corrections January 1989 edition). L. Yao, A. William, Nonlinear parameter estimation via the genetic algorithm, IEEE Trans. Signal Process. 42 (1994) 927–937. C.R. Reeves, Using genetic algorithms with small populations, in: Proceedings of the Fifth International Conference on Genetic Algorithms, 1993, pp. 92–99. S. Louis, G. Rawlins, Syntactic analysis of convergence in genetic algorithms, Morgan Kaufman, San Mateo, CA, 1993 (Chapter 4). D. Weuster-Botz, V. Pramatarova, G. Spassov, C. Wandry, Use of a synthetic growth medium for Arthrobacter simplex with high hydrocortisone-1 activity, J. Chem. Technol. Biotechnol. 64 (1995) 386–392. P.L. Altman, D.S. Dittmer (Eds.), Blood and Other Body Fluids, Federation of American Societies for Experimental Biology, Bethesda, MD, 1961 (Third printing with minor corrections, 1971 edition). H. Pohlheim, Genetic and Evolutionary Algorithm Toolbox for Use with Matlab-Documentation, Technical Report, Technical University Ilmenau, 1996. J. Huddleston, A. Veide, K. K¨ohler, J. Flanagan, S.-O. Enfors, A. Lyddiatt, The molecular basis of partitioning in aqueous two-phase systems, Trends Biotechnol. 9 (1991) 381–388.