A hybrid GMDH neural network to investigate partition coefficients of Penicillin G Acylase in polymer–salt aqueous two-phase systems

A hybrid GMDH neural network to investigate partition coefficients of Penicillin G Acylase in polymer–salt aqueous two-phase systems

Journal of Molecular Liquids 188 (2013) 131–135 Contents lists available at ScienceDirect Journal of Molecular Liquids journal homepage: www.elsevie...

592KB Sizes 3 Downloads 67 Views

Journal of Molecular Liquids 188 (2013) 131–135

Contents lists available at ScienceDirect

Journal of Molecular Liquids journal homepage: www.elsevier.com/locate/molliq

A hybrid GMDH neural network to investigate partition coefficients of Penicillin G Acylase in polymer–salt aqueous two-phase systems Gholamreza Pazuki a,⁎, Saeed Seyfi Kakhki b a b

Department of Chemical Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran Seyfi Trading Company, Tehran, Iran

a r t i c l e

i n f o

Article history: Received 2 August 2013 Received in revised form 18 September 2013 Accepted 1 October 2013 Available online 11 October 2013 Keywords: Partitioning Aqueous two-phase systems GMDH Neural network

a b s t r a c t To model partition coefficients of Penicillin G Acylase in polymer–salt aqueous two-phase systems (ATPS), a hybrid GMDH neural network is presented on the basis of original GMDH approach. We performed two major amendments to the structure of original approach in order to enhance the model complexity and power to trace high order of non-linearity. An extensive data set observed by Pazuki et al. is examined by original GMDH approach and hybrid GMDH neural network model. Compared to results generated by original GMDH approach and UNIFAC-FV model, hybrid model stands out with a noticeable superiority in tracking data trend. Average Absolute Deviation percent (AAD%) of proposed hybrid model is 4.04% which indicates a superior accuracy in comparison with those of original GMDH approach and UNIFAC-FV model with AADs% of 6.91% and 5.58%. Crown Copyright © 2013 Published by Elsevier B.V. All rights reserved.

1. Introduction Separation and purification of biomaterials such as amino acids, proteins and antibiotics have always been a great concern in pharmaceutical and nutrition industries. Various methods are used to separate biomaterials among which precipitation, solvent and membrane extraction are the most ubiquitous classic ones. As biomaterials are extremely sensitive to any change in their environment such as shear stress, temperature and pH, new methods have been devised to provide a mild medium for the purpose of separation and purification. Omnipresence of water in aqueous two-phase system (ATPS) offers a suitable amenity to biomaterials evading any possible harm made throughout the process [1–3]. Hitherto, partition coefficients of different biomaterials in aqueous two-phase systems have been measured experimentally. Through these experiments, susceptibility of partitioning to variety of effects like those of temperature, pH, concentration of polymer and salt in feed and so forth has been studied [4–8]. As far as experimental measurement of partitioning coefficients of biomaterials in systems of interest is time-consuming and costly, mathematical modeling of these systems highly concerns. Several comprehensive studies on aqueous two-phase systems can be enumerated to represent a holistic background of ATPS-based separation processes. Peng et al. incorporated the adsorption lattice model of Baskir and Pitzer's model into a new model to account for short and long range electrostatic interactions [9]. Their proposed model replicates actual partition coefficients of different proteins in aqueous twophase systems of PEG 4000, KH2PO4–K2HPO4 with a good accuracy. A ⁎ Corresponding author. Tel.: +98 64543159. E-mail address: [email protected] (G. Pazuki).

semi-empirical equation for correlation and prediction of partition coefficients of hydrolytic enzymes in aqueous two-phase systems is put forward by the work of Furaya et al. Equations pertinent to phase equilibrium criteria of liquid–liquid systems have been associated with the modified Flory–Huggins thermodynamic model to examine DEX and PEG including systems [10]. With the aid of an osmotic virial expansion model and on the backbone of group contribution theory, Großmann et al. modeled partition coefficients of amino acids in aqueous twophase systems [11]. Madeira et al. supposed that a charge on a protein molecule changes with any variation in pH [12]. In this regard, they proposed a modified Wilson model to study partitioning of proteins in polymer–salt aqueous two-phase systems. The model considers no adjustable interaction parameter between components. Pazuki et al. set forth non-random mixture assumption on which a modified version of the Wilson model is developed to investigate the phase behavior of polymer–polymer and polymer–salt aqueous two-phase systems containing various biomolecules [13]. They also enhanced the model by taking advantage of the group contribution theory together with a combinatorial term backed by a new model of Freed-FV [14]. The newly-defined model is applied to examine the phase behavior of biomolecule containing solutions especially partitioning in aqueous twophase systems. More recently, Pazuki et al. designed a neural network model dependent on weight percent of components in feed, temperature, molecular weight of polymers and difference between weight of components in upper and lower phase to study partition coefficients of biomolecules in aqueous two-phase systems of DEX and PEG [15]. Partition coefficients of Penicillin G Acylase in aqueous two-phase systems holding high molecular weight PEG and different salts, namely, monosodium phosphate and sodium citrate are measured by an exhaustive study performed by Pazuki et al. [16].

0167-7322/$ – see front matter. Crown Copyright © 2013 Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.molliq.2013.10.001

132

G. Pazuki, S. Seyfi Kakhki / Journal of Molecular Liquids 188 (2013) 131–135

The emergence of intelligent systems has widely inspired science and engineering fields wherein accuracy of sophisticated theories do not reimburse for the time and effort wasted on resolution of derived equations. On the other hand, simple theories do not provide enough accuracy for prediction of experimental data. Despite the fact that intelligent systems do not provide theory-derived equations, mathematical equations devised by them are simultaneously accurate and easy to handle. The objective of this treatise is to develop a hybrid Group Method of Data Handling (GMDH) neural network which compensates for built-in drawbacks of the original GMDH. Compared to the original GMDH and UNIFAC-FV model, hybrid network provides a higher accuracy of 4.04% which is well below those of original GMDH approach and UNIFAC model. 2. Model section

where, parameters a, b, c, d are polynomial coefficients set by algorithm. Nν stands for number of independent variables. An N-numbered observed data set can be structured in the form of a matrix as depicted in Fig. 1. V y ¼ ðy1 ; y2 ; …; yn Þ; Left matrix holds the vector of observed results ! Vx ¼ while the right one represents the vector of independent variables ! ðx1 ; x2 ; …; xn Þ;. As mentioned above, a quadratic polynomial in terms of combination of two independent variables at a time can be proposed to   express actual data. So, out of M variables, M 2 quadratic polynomials can be expressed: GMDH

zi

2

yi ¼ a þ

Nv X i¼1

bi xi þ

Nv X Nv X i¼1 j¼1

cij xi x j þ … þ

Nv X Nv X i¼1 j¼1



Nv X k¼1

dij…k xi x j …xk

ð1Þ

2

ð2Þ

Now, the matrix of independent variables can be defined in terms of V z ¼ ðz1 ; z2 ; …; zn Þ. Coefficients of Eq. (2) are vector of new variables ! calculated by means of Least Square Method. The objective is to minimize the square of deviation from actual data for each column.

2.1. GMDH model The interpretation of Darwin's evolutionary postulate in the realm of cybernetics led to the advent of intelligent algorithms which opened a new horizon of computation by cracking the most complex unstructured ambiguous systems of economics, thermodynamics, weather forecast, genetics and so forth. In this regard, a multitude of brand new algorithms such as Genetic Algorithms (GA), Neural Networks (NN), Group Method of Data Handling (GMDH) and a combination of them as Hybrid Self-organizing Systems came up with higher power of prediction. As Darwin hypothesizes, nature chooses among the most perfect species and eliminates weaker ones. Perfect species in turn breed offspring among which the most perfect ones get a chance to survive. In this way, the traits of survived species become dominant in the next generation. Group Method Data Handling algorithm [17,18], firstly introduced by Ivakhnenko, follows the philosophy of Darwin's theory of natural selection. The algorithm is based on the selection of the most appropriate quadratic polynomial expressions built by combination of each two independent variable at a time. As the algorithm iterates, a general multinomial expression is gradually devised in each step. The grand correlation multinomial which models the entire system takes the form of Volterra–Kolmogorov–Gabor (VKG) [18]:

2

¼ aAi þ bBi þ cAi Bi þ dAi þ eBi þ f :

δj ¼

Nt h X

i

GMDH 2

yi −zi

 j ¼ 1; 2; …;

i¼1

M 2

 ð3Þ

where, Nt stands for number of data used for training system. The observed data set is divided into two individual sets; training and testing set. The ratio of training data set to testing one is arbitrarily chosen. Obviously, the training data set is used to determine the coefficients of Eq. (2). The testing data set is used to select the most appropriate combination of variables (Zi). Deviation of predicted results from actual testing data must meet the pre-defined criteria. 2

δj ¼

N h X

i GMDH 2

yi −zi

 bε j ¼ 1; 2; …;

i¼N t þ1

M 2

 ð4Þ

ε is arbitrarily chosen. z columns which meet the criteria are stored and those which fall short are omitted. The total deviation corresponding to each iteration is saved and compared to that of the former till the minimum value is reached. A schematic of the GMDH network is shown in Fig. 2. 2.2. Hybrid GMDH neural network The aim of the current work is to propose a GMDH-type neural network which models partition coefficients of Penicillin G Acylase in different polymer–salt aqueous two-phase systems (ATPS). In the original approach of GMDH, the selection of candidate variables occurs two

Fig. 1. Observed data structured by a N × Nν matrix.

G. Pazuki, S. Seyfi Kakhki / Journal of Molecular Liquids 188 (2013) 131–135

Fig. 2. A schematic of the GMDH network.

at a time in each layer. This results in exclusion of other variables effect which in turn leads to generation of less precise nodal polynomials unable to follow the trend of systems of high non-linearity. To overcome the above-mentioned drawback of the original method, a hybrid system of GMDH and neural network is proposed. It is assumed that each node can be generated out of any combination of input variables unless the order of polynomial exceeds two. Hybrid

zi

¼aþ

Nv X

bi xi þ

i¼1

Nv X Nv X

cij xi x j :

ð5Þ

i¼1 j¼1

Another modification made is to allow nodes in each layer to cross over any independent variables and nodes in previous layers. This provides an additional complexity to nodal expression. As the number of terms and combinations among nodes increase the possibility of generating more accurate nodal expressions increases and the entire model becomes more empowered to simulate non-linear systems. It is assumed that partition coefficients of antibiotics (K) is a function of independent variables of Temperature (T), pH, Tie Line Length (TLL) and the weight fraction of polymer (xp), salt (xs) and antibiotic (xa) in feed stream. 3. Results and discussion A 48 data set is gathered from the work of Pazuki et al. [16]. Both the original GMDH network and the hybrid GMDH neural network are run Table. 1 Performance criteria for the hybrid model. Model

Hybrid GMDH

Training set

Testing set

RMSE

MAE

SD

RMSE

MAE

SD

0.053905

0.038441

0.073036

0.02512

0.019643

0.039047

Fig. 3. A schematic of proposed hybrid GMDH neural network.

133

134

G. Pazuki, S. Seyfi Kakhki / Journal of Molecular Liquids 188 (2013) 131–135

Table 2 Nodal expressions for hybrid GMDH neural network. Layer 1 Node 1 Node 2 Node 3 Node 4 Node 5

Z1 = 1.243x2 + 0.03941x2x3 − 0.2355x22 − 0.1575x3 − 0.0006114x23 Z2 = −626.7 + 4.204x1 + 0.0278x1x2 − 0.007086x21 − 7.823x2 − 0.08912x2x4 + 0.7179x4 − 0.01328x24 Z3 = 0.000215x1x6—0.00108x26 Z4 = 1.75x5 + 0.06175x6 − 0.0009859x26 Z5 = −495.4 + 3.094x1 − 0.005049x21 + 0.3865x3 − 0.02537x3x4 − 0.002513x23 + 3.45x4 − 0.1333x24

Node 1 Node 2 Node 3

W1 = −4.404 − 10. 21Z2Z5 + 7. 357Z3Z5 + 10. 66Z2 − 3. 816Z23 + 1. 423Z25 W2 = −14.55 − 0.3001x4 + 0.01404x24 + 19. 18Z1 + 15.03Z5 − 19. 22Z1Z5 + 2. 142Z25 W3 = −4.404 + 10. 66Z2 − 10. 21Z2Z5 + 7. 357Z3Z5 − 3. 816Z23 + 1. 423z25

Layer 2

Genome expression K = 4.856 * W1W2 − 2.053W21 − 7.571W2 − 6.305W2W3 + 4.891W22 + 8.549W3 − 1.35W23 Note: Temperature (x1), pH (x1), weight fraction of polymer in feed (x3), weight fraction of salt in feed (x4), weight fraction of antibiotic in feed (x5).

to study partition coefficients of Penicillin G Acylase in aqueous twophase systems. In a separate study, Pazuki et al. applied UNIFAC-FV model to once gathered data [14]. We obtained the results of their study to check the accuracy level of our hybrid GMDH neural network. The ratio of training to testing test for both models is chosen 9. The developed hybrid GMDH neural network is applied to model partition coefficients of Penicillin G Acylase in salt–polymer aqueous two-phase systems. Performance criteria of Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Standard Deviation (SD) corresponding to training and testing steps of hybrid model are reported in Table 1. 2X  2 31 = N 2 Hybrid −yactual i 6 i¼1 yi 7 RMSE ¼ 4 5 N

ð6Þ

MAE ¼

 XN  Hybrid  −yactual  y i i¼1 i

ð7Þ

N

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N u 1 X 2 SD ¼ t ðx −xÞ N−1 i¼1 i

XN x¼

x i¼1 i

N

:

ð8Þ

A schematic of proposed hybrid model is shown in Fig. 3. The hybrid model has one input layer, two middle layers of eight nodes on aggregate and one output layer. As seen, node 2 in layer 2 is connected to node 4 in input layer stating that a cross over occurred. Generated expressions corresponding to each node in layers as well as total correlation function or genome of the model, so to speak, are presented in Table 2. Nodal expressions range from simple two-termed Z3 to more

Table 3 Comparison between hybrid and original GMDH. No.

Data Actual

No. Model Hybrid

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Total

0.782 0.885 0.852 0.551 0.926 1.048 1.019 0.860 0.984 0.964 0.938 0.993 0.995 0.924 0.938 0.941 1.093 1.02 1.011 1.05 0.822 0.851 0.848 0.997

Data Actual

Original

Model Hybrid

Predicted

%AAD

Predicted

%AAD

0.802 0.829 0.916 0.664 0.824 0.980 1.014 0.893 0.977 0.988 0.956 1.011 0.899 0.898 0.926 0.935 0.998 1.024 1.005 1.022 0.973 0.905 0.952 0.896

2.646 6.315 7.527 20.578 10.980 6.400 0.487 3.866 0.666 2.564 1.951 1.881 9.550 2.720 1.273 0.607 8.686 0.470 0.518 2.626 18.479 6.389 12.337 10.099

0.918 0.920 0.938 0.917 0.926 0.906 0.946 0.926 0.979 0.989 0.999 0.979 0.918 0.916 0.925 0.920 0.946 0.949 0.946 0.954 0.959 0.959 0.950 0.959

17.373 3.935 10.040 66.402 0.014 13.531 7.208 7.660 0.500 2.585 6.478 1.402 7.753 0.877 1.413 2.250 13.490 7.009 6.473 9.104 16.715 12.738 11.976 3.771 4.037

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

0.972 0.954 0.98 1.007 0.977 1.000 0.951 0.971 1.023 0.997 1.019 1.012 0.961 0.955 1.015 0.988 1.005 1.013 1.023 1.020 1.011 1.004 0.999 1.054

Original

Predicted

%AAD

Predicted

%AAD

0.987 0.983 1.058 1.001 0.996 0.930 0.970 0.993 0.970 0.944 1.040 0.986 0.954 0.950 1.017 0.989 1.003 1.007 1.016 1.011 1.013 0.961 0.987 1.022

1.523 3.043 8.006 0.602 1.911 7.047 1.957 2.317 5.216 5.310 2.012 2.586 0.729 0.525 0.184 0.107 0.179 0.618 0.699 0.861 0.161 4.323 1.208 3.068

0.941 0.940 0.941 0.946 0.946 0.946 0.946 0.946 0.965 0.970 0.999 0.993 0.935 0.933 0.938 0.941 0.946 0.946 0.946 0.946 0.979 0.969 0.969 0.969

3.140 1.519 3.931 6.018 3.219 5.445 0.573 2.621 5.640 2.686 1.986 1.892 2.748 2.343 7.632 4.709 5.915 6.658 7.571 7.299 3.157 3.462 2.979 8.042 6.914

G. Pazuki, S. Seyfi Kakhki / Journal of Molecular Liquids 188 (2013) 131–135

135

Table 4 Comparison of GMDH with thermodynamic model. Model (%AAD) Original GMDH 6.91

Hybrid GMDH 4.04

UNIFAC-FV 5.58

Fig. 4 depicts predicted partition coefficients against the actual data. As seen in Fig. 4.B, datum number 4 with actual value of K = 0.551 falls far distant to identity line. This data point is an exception which does not follow a Logical trend even when experimentally observed. However, this data point was not omitted. Instead authors attempted to modify the model to find out if developed model can predict the data with fair accuracy. The modifications made increase the prediction power of the hybrid network and shift the predicted value towards a closer point to actual point (Fig. 4.A). Fig. 5 plots the predicted and actual data for each observation against the number of observation. The hybrid model not only falls closer to actual data but also traces the trend of data more realistically. A comparison is made in Table 4 between %AAD of the proposed hybrid model, the original model and the UNIFAC-FV model. It's inferred that the hybrid GMDH neural network is more successful in predicting partition coefficients of antibiotic in aqueous two-phase systems (ATPS). 4. Conclusion Fig. 4. Predicted partition coefficients plotted against actual data. (A) Hybrid GMDH neural network, (B) GMDH network.

complicated eight-termed Z2. The actual and predicted results together with related Average Absolute Deviation (%AAD) are reported for both models in Table 3. Predicted results by hybrid model aver that in comparison to the original method, the hybrid network enjoys more congruity. %AAD ¼

   N  model 100 X −yactual  yi i  :  actual   N y i i¼1

ð9Þ

In the current work, a hybrid GMDH neural network is applied to model partition coefficients of Penicillin G Acylase in polymer–salt aqueous two-phase systems (ATPS). GMDH network is a suitable approach to model sophisticated unstructured systems. However, since the original model generates simple quadratic nodal expressions which take into account only the combination of two independent variables at a time, this approach falls short in producing good results. In the current work, two amendments are made to the original method. First, the proposed hybrid GMDH neural network considers a quadratic multinomial devised by the incorporation of whole independent variables at a time. Next, the hybrid model can cross over different nodes in different layers. Despite the high-ordered non-linearity of the system, the hybrid model is capable of generating good results with %AAD of 4.04, which indicates a noticeable preponderance over the prevalent thermodynamic model of UNIFAC-FV. References

Fig. 5. Predicted partition coefficients plot against data number.(A) Hybrid GMDH neural network, (B) GMDH network. (●) Training data set. (▲)Testing data set. (2) Model results.

[1] P.A. Albertsson, Partitioning of Cell Particles and Macromolecules, Wiley-Interscience, New York, 1986. [2] R.K. Scopes, Protein Purification: Principles and Practice, Springer-Verlag, 1994. [3] B.Y. Zaslavsky, Aqueous Two-phase Partitioning Physical Chemistry and Bioanalytical Applications, Marcel Dekker, New York, 1995. [4] Y. Liu, Z. Wu, Y. Zhang, H. Yuan, Biochem. Eng. J. 69 (2012) 93–99. [5] M.T. Zafarani-Moattar, Sh. Hamzehzadeh, Biotechnol. Prog. 27 (2011) 986–997. [6] C.F.C. Marques, T. Mourão, C.M.S. Neves, Á.S. Lima, I. Boal-Palheiros, J.A.P. Coutinho, M.G. Freire, Biotechnol. Prog. 29 (2013) 645–654. [7] Sh. Shahriari, S. Ghayour Doozandeh, G.R. Pazuki, J. Chem. Eng. Data. 57 (2012) 256–262. [8] M. Yavari, G.R. Pazuki, M. Vossoughi, S.A. Mirkhani, A.A. Seifkordi, Fluid Phase Equilib. 337 (2013) 1–5. [9] Q. Peng, Z. Li, Y. Li, Fluid Phase Equilib. 107 (1995) 303–315. [10] T. Furuya, S. Yamada, J. Zhu, J. Yamaguchi, Y. Iwai, Y. Arai, Fluid Phase Equilib. 125 (1996) 89–102. [11] C. Großmann, R. Tintinger, J. Zhu, G. Maurer, Fluid Phase Equilib. 137 (1997) 209–228. [12] P.P. Madeira, X. Xu, J.A. Teixeira, E.A. Macedo, Biochem. Eng. J. 24 (2005) 147–155. [13] G.R. Pazuki, V. Taghikhani, M. Vossoughi, Z. Phys. Chem. 223 (2009) 263–278. [14] G.R. Pazuki, V. Taghikhani, M. Vossoughi, Ind. Eng. Chem. Res. 48 (2009) 4109–4118. [15] G.R. Pazuki, V. Taghikhani, M. Vossoughi, M. Particulate, Sci. Technol. 28 (2010) 67–73. [16] G.R. Pazuki, V. Taghikhani, M. Vossoughi, J. Chem. Eng. Data. 55 (2010) 243–248. [17] A.G. Ivakhnenko, Sov. Autom. Control 13 (1966) 43–71. [18] A.G. Ivakhnenko, EEE Trans. Syst. Man Cybern. 1 (1971) 364–378.