Deep hybrid modeling of chemical process: Application to hydraulic fracturing

Deep hybrid modeling of chemical process: Application to hydraulic fracturing

Deep hybrid modeling of chemical process: Application to hydraulic fracturing Journal Pre-proof Deep hybrid modeling of chemical process: Applicatio...

2MB Sizes 0 Downloads 84 Views

Deep hybrid modeling of chemical process: Application to hydraulic fracturing

Journal Pre-proof

Deep hybrid modeling of chemical process: Application to hydraulic fracturing Mohammed Saad Faizan Bangi, Joseph Sang-Il Kwon PII: DOI: Reference:

S0098-1354(19)31101-9 https://doi.org/10.1016/j.compchemeng.2019.106696 CACE 106696

To appear in:

Computers and Chemical Engineering

Received date: Revised date: Accepted date:

18 October 2019 27 November 2019 22 December 2019

Please cite this article as: Mohammed Saad Faizan Bangi, Joseph Sang-Il Kwon, Deep hybrid modeling of chemical process: Application to hydraulic fracturing, Computers and Chemical Engineering (2019), doi: https://doi.org/10.1016/j.compchemeng.2019.106696

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.

Highlights • Integration of deep neural networks with first principles for hybrid modeling. • Deep neural networks are trained using Levenberg-Marquardt algorithm. • Application of deep hybrid modeling framework for hydraulic fracturing.

1

Deep hybrid modeling of chemical process: Application to hydraulic fracturing Mohammed Saad Faizan Bangia,b , Joseph Sang-Il Kwona,b,∗ a

Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77845 USA b Texas A&M Energy Institute, Texas A&M University, College Station, TX 77845 USA

Abstract Process modeling began with the use of first principles resulting in ‘white-box’ models which are complex but accurately explain the dynamics of the process. Recently, there has been tremendous interest towards data-based modeling as the resultant ‘black-box’ models are simple, and easy to construct, but their accuracy is highly dependent on the nature and amount of training data used. In order to balance the advantages and disadvantages of ‘white-box’ and ‘black-box’ models, we propose a hybrid model that integrates first principles with a deep neural network, and applied it to hydraulic fracturing process. The unknown process parameters in the hydraulic fracturing process are predicted by the deep neural network and then utilized by the first principles model in order to calculate the hybrid model outputs. This hybrid model is easier to analyze, interpret, and extrapolate compared to a ‘black-box’ model, and has higher accuracy compared to the first principles model. Keywords: Deep learning, first principles, hybrid modeling, Levenberg-Marquardt algorithm, hydraulic fracturing. 1. Introduction Process modeling is the task of obtaining a mathematical representation for knowledge about any physical process (Cameron and Hangos, 2001). Depending on the nature of knowledge, models can be classified into various categories. First principles or mechanistic models also known as ‘white box’ models are obtained using the mass and energy conservation laws, kinetic laws, thermodynamic laws, transport laws, etc. This class of models is transparent and easy-to-understand as they usually ∗

Corresponding author: Tel: +1 (979) 862-5930; E-mail: [email protected].

Preprint submitted to Computers & Chemical Engineering

December 26, 2019

contain parameters with physical meaning, valid over a wide range of operating conditions of the process, but are complex and computationally expensive to solve. On the other hand, data-driven models or ‘black box’ models are obtained using process data and computationally inexpensive to solve, but are usually difficult to interpret as the nature of parameters is unknown, and have narrow range of applicability. Hybrid models or ‘grey box’ models are a third class of models which are a combination of white box and black box submodels (Thompson and Kramer, 1994). In process modeling, the concept of hybrid (grey box) models evolved from the field of neural networks (Psichogios and Ungar, 1992; Kramer et al., 1992; Johansen and Foss, 1992; Su et al., 1993). The idea was to build neural network based hybrid models through the use of first principles knowledge. This resulted in hybrid models with better prediction accuracy compared to the first principles models, and better interpolation, extrapolation, and interpretation compared to solely neural networks based models. There exist other kinds of grey box models that combine different types of first principles knowledge and/or empirical submodels. During the 90s, the term ‘grey box’ models appeared in systems and control theory wherein the structural information from the first principles models was incorporated into the data-based models (Bohlin and Graebe, 1995; Jorgensen and Hangos, 1995; Tulleken, 1993). But understanding of the term ‘grey box’ has evolved to represent all types of hybrid models that combine first principles and data-based submodels. Nonetheless, hybrid modeling balances the advantages and disadvantages of strictly first principles and data-based modeling, and offers desirable benefits such as high prediction accuracy, better extrapolation capabilities, ease of calibration, and better interpretability. For these reasons, hybrid modeling has numerous applications in chemical and biochemical engineering. For instance in modeling of chemical reactor (Zahedi et al., 2011; Gupta et al., 1999; Molga and Cherba´ nski, 1999; Qi et al., 1999), polymerization processes (Tsen et al., 1996; Fiedler and Schuppert, 2008), crystallization (Lauret et al., 2000; Georgieva and de Azevedo, 2009), metallurgic processes (Reuter et al., 1993; Hu et al., 2011; Jia et al., 2011), distillation columns (Safavi et al., 1999; Mahalec and Sanchez, 2012), drying processes (Cubillos and Acu˜ na, 2007), thermal devices (Arahal et al., 2008), mechanical reactors (Nascimento et al., 1999), milling (Aguiar and Filho, 2001; Kumar Akkisetty et al., 2010), modeling of yeast fermentations (Schubert et al., 1994b,a; Eslamloueyan and Se-

3

toodeh, 2011), modeling of fungi cultivations (Preusting et al., 1996; Wang et al., 2010), modeling of bacteria cultivations (Simutis and Lbbert, 1997; Gnoth et al., 2008), modeling of mammalian cell cultivations (Dors et al., 1995; Teixeira et al., 2007), modeling of insect cell cultivations (Carinhas et al., 2011), modeling of hybridoma cell cultivations, (Fu and Barford, 1995, 1996), etc. For more information one can view von Stosch et al. (2014), an excellent review paper on hybrid modeling in the field of process systems engineering. More recently, Ghosh et al. (2019) proposed a hybrid modeling framework involving first principles integrated with subspace identification algorithm whose purpose is to minimize the mismatch between the process outputs and plant data. Building hybrid models is one way of taking advantage of the ‘big data’ available today. Apart from hybrid modeling, ‘big data’ can be utilized in multi-scale modeling and integrated decision making of process systems which usually involve chemical processes functioning on different time/length scales, and for more information on this one can read this excellent review paper by Tsay and Baldea (2019). Additionally, to understand the path forward and the challenges that lie ahead in chemical engineering related to integrating data-driven processing with scientific reasoning, one can review this excellent review paper written by Venkatasubramanian (2018). Recall, hybrid modeling started in 1992 from the use of neural networks along with first principles knowledge (Psichogios and Ungar, 1992; Kramer et al., 1992; Johansen and Foss, 1992; Su et al., 1993). Neural networks are connectionist models that map an input space to an output space and were inspired by the biological networks present in the brain. Each neural network contains elements called ‘neurons’ or ‘nodes’ that process an input and give an output, and comprises of layers containing many such nodes. Each node in a particular layer is connected to every node in an adjoining layer, and the strength of each connection is represented by an assigned weight. Consequently, a node in a particular layer receives inputs from all the nodes in the previous layer which are added together after weights are applied on each of them, and a ‘bias’ is added to the sum. If the neural network only contains weights and biases, then it behaves as a linear function. In order to capture nonlinearity in the input-output data, certain nonlinear functions called ‘Activation functions’ are used in each node. With activation functions, the input to the neural network undergoes nonlinear transformation through each layer and is collected as the output at the final

4

layer. Overall, the number of nodes, number of layers, weights in each connection, and the biases are the internal parameters of the neural network. Under certain assumptions, neural networks, if sufficiently large, have been shown to capture any nonlinear continuous function accurately (Stinchcombe and White, 1989). With the advancements in Machine Learning, the field of neural networks has evolved from the use of a single hidden layer to multiple hidden layers resulting in deep neural networks. Recently, there has been lots of interests in understanding and comparing the capabilities of shallow neural networks and deep neural networks as function approximators. For instance, it has been proven that deep sum-product networks for polynomial functions (Delalleau and Bengio, 2011), deep neural networks for piecewise smooth functions (Liang and Srikant, 2017), and three layer network for a specific function (Eldan and Shamir, 2016) require an exponentially less number of neurons than their shallow counterparts. Also, it has been shown that the number of linear regions of the functions approximated by deep neural networks is exponentially large compared to shallow neural networks (Montufar et al., 2014). These works demonstrate that shallow networks require an exponentially large number of neurons compared to deep networks for specific functions, and hence, depict the power of deep networks over shallow networks. In this work, we develop a hybrid model for a hydraulic fracturing process that combines its first principles model with a deep neural network that acts as an estimator of its unmeasured process parameters. The hydraulic fracturing process is a complex system in which the fracturing fluid containing proppant is pumped into the reservoir to create and sustain the created fractures in order to extract oil/gas through them from the reservoir. There are two main issues when modeling this process which are its moving boundary nature and numerous process uncertainties. One such uncertainty is the leak off rate of the fracturing fluid into the reservoir which is difficult to accurately explain using first principles, and building a first principles based model with such a knowledge gap will result in its inaccuracy. Therefore, the novelty in this work is two fold. First, the use of deep neural networks for the purpose of building hybrid models. Second, the application of this hybrid modeling methodology to a complex hydraulic fracturing process in order to explain one of its underlying fluid leak-off phenomena and build a superior model. The remainder of this text is organized as follows: Section 2 provides the methodology proposed

5

by Psichogios and Ungar (1992) to build a neural network based hybrid model. Section 3 gives a brief background about deep neural networks and the Levenberg-Marquardt algorithm used in this work to train them. Section 4 presents our proposed methodology to build a deep neural network based hybrid model. Section 5 presents the first principles model of hydraulic fracturing process utilized in this work, and Section 6 presents the application of our proposed technique. 2. Hybrid neural network model Consider a dynamical system with the following generic representation: dx = f (x, u, p) dt

(1)

p = g(x, u)

(2)

where x, u, and p denote the states, control inputs, and model parameters, respectively. The dynamics of the states of the system is explained by the equations in f which are dependent on parameters p. These parameters p are related to the states x and the inputs u through the equations in g. In many of the chemical engineering processes, the parameters p are complex and unknown. For instance, in biological reactions, the cell growth rate and the corresponding reaction kinetics are usually unknown and difficult to be derived from first principles. Consequently, the first principles models of such processes will contain unknown terms. Psichogios and Ungar (1992) developed the idea of a hybrid model wherein a first principles model works in conjunction with a neural network ‘black box’ model that can explain the unknown parameters. This methodology is superior when compared to the straightforward neural network-based ‘black box’ modeling of the entire system, in that it has an internal structure which clearly represents the interactions between the state variables and the parameters. This structuring allows for easier analysis and interpretation compared to the ‘black box’ model. Also, the hybrid model has better generalization and extrapolation capabilities as the basic structure; specifically, the first principles model is not affected during any process identification tasks and only the ‘black box’ model is altered. A schematic representation of the neural network based hybrid model is shown in Fig. 1. The neural network model’s inputs are the current state of the system xk and the external input applied

6

on the system uk , and its output is the current parameter value p. The output of the neural network is fed as input to the first principles model along with the current state and input, xk and uk , respectively. The first principles model then predicts the state at the next time step, i.e. xk+1 . Together, the neural network model and the first principles model constitute the hybrid model for the entire process. Before the neural network can be utilized, it has to be trained. A set of input and state measurements is utilized for this purpose. The neural network is initialized by defining its parameters, and an error signal is backpropagated through the network to update them. However, for the hybrid model, the output of the neural network is not directly available as it is not measured. Hence, the first principles model is utilized to calculate the error signal which is then used to update the parameters of the network. The network is fully trained when the error in the hybrid model’s predictions and the actual measurements fall below a predefined tolerance. Once trained, the neural network can be utilized along with the first principles to form the hybrid model. 3. Preliminaries In this section, the concept of deep neural networks is briefly discussed followed by the LevenbergMarquardt training algorithm utilized in this work. 3.1. Deep neural networks Deep neural networks are the neural networks with more than three layers and each layer contains multiple neurons. The neurons in each layer are fully connected to the neurons in the subsequent layer as shown in Fig. 2. The strength of each connection is assigned by its weight w, and each neuron processes the input through a nonlinear function called activation function f . Let nk+1 (i) be the net input to each unit i in layer k + 1 which is given by nk+1 (i) =

Sk X

wk+1 (i, j)ak (j) + bk+1 (i)

(3)

j=1

Let ak+1 (i) be the output of unit i in layer k + 1 which is given by ak+1 (i) = f k+1 (nk+1 (i))

7

(4)

Assuming there are M layers in the network, the equations in matrix form can be represented as Ak+1 = F k+1 (W k+1 Ak + B k+1 ), A 0 = uq ,

k = 0, 1, ..., M − 1

q = 1, 2, ..., Q

(5) (6)

where uq is the input vector to the neural network and its corresponding output is AM q . The matrices Ak , F k , W k , and B k contain the outputs, the activation functions, weights, and biases of all the neurons in layer k, respectively. The aim of the neural network is to learn the relationship between the input-output pairs {(u1 , y1 ), (u2 , y2 ), ....(uQ , yQ )}. The performance of the neural network is measured as follows:

Q

1X T V = e eq 2 q=1 q

(7)

eq = (yq − AM q )

(8)

where eq is the error when the q th input is given to the neural network. The error matrix E can be defined as follows: E = [e1 e2 . . . eQ ]T

(9)

3.2. Levenberg-Marquardt training The Error Backpropagation (EBP) algorithm (Rumelhart et al., 1986; Werbos, 1988) is one of the most significant breakthroughs in regard to the training of neural networks, but it has an issue of slow convergence, which can be explained using the following two reasons. First, the step size should be small to prevent oscillations around the required minima but this would lead to slow training process. Second, the curvature of the error surface could vary in different directions, so the classical ‘error valley’ problem (Osborne, 1992) could exist and may lead to slow convergence rate. Despite its issue of slow convergence, the steepest descent algorithm is widely used to train neural networks. Its update rule utilizes the gradient g, which is the first-order derivative of the total error function, is defined as follows:  ∂V (u, w) ∂V = g= ∂w ∂w1

8

∂V ∂w2

...

∂V ∂wN

T

(10)

The update rule of the steepest descent algorithm is written as: wk+1 = wk − αgk

(11)

where α is the learning constant. This issue of slow convergence in the steepest descent algorithm can be improved by the GaussNewton algorithm (Osborne, 1992). The Gauss-Newton algorithm utilizes the second-order derivatives of the error function to gauge the curvature of error surface and find the appropriate step size for each direction. The convergence is fast if the error function has a quadratic surface, without which this algorithm would be mostly divergent. The update rule of the Gauss-Newton method is defined as: wk+1 = wk − (JkT Jk )−1 Jk Ek where J is the Jacobian matrix which is defined as:  ∂e1 ∂e1  ∂w1 ∂w2   ∂e2 ∂e2  J =  ∂w1 ∂w. 2 .  .. ..   ∂eQ ∂eQ ∂w1 ∂w2

... ... .. . ...

 ∂e1 ∂wN   ∂e2   ∂wN     ∂eQ  ∂wN

(12)

(13)

The Levenberg-Marquardt algorithm (Levenberg, 1944; Marquardt, 1963) combines the steepest

descent method and the Gauss-Newton algorithm. Consequently, it inherits the stability characteristic of the steepest descent algorithm and the speed of the Gauss-Newton algorithm. The Levenberg-Marquardt algorithm functions as the steepest descent algorithm when the curvature of error surface is complex until the local curvature takes a reasonable approximation of a quadratic surface wherein the Levenberg-Marquardt can behave as the Gauss-Newton algorithm. Essentially, the Levenberg-Marquardt algorithm utilizes the superior aspects of the steepest descent algorithm and the Gauss-Newton algorithm alternatively based on the situational requirements. The update rule of the Levenberg-Marquardt algorithm is given by wk+1 = wk − (JkT Jk + µI)−1 Jk Ek

9

(14)

where µ is the combination coefficient, and I is the identity matrix. When the µ value is large, then the Levenberg-Marquardt algorithm behaves as the steepest descent method; otherwise, it behaves as the Gauss-Newton method. Algorithm EBP Gauss-Newton Levenberg-Marquardt

Parameter updates wk+1 = wk − αgk wk+1 = wk − (JkT Jk )−1 Jk Ek wk+1 = wk − (JkT Jk + µI)−1 Jk Ek

Convergence stable, slow unstable, fast stable, fast

Computational issue Gradient Jacobian Jacobian

Table 1: Summary of the training algorithm.

4. Proposed deep hybrid model Consider the dynamical system defined using Eqs. (1), (2) along with the following equation: y = h(x)

(15)

where y denotes the output of the system which is related to the states through the equation in h. In this proposed methodology, a DNN will be trained using data to predict the originally unknown parameter values p, and this trained DNN alongside a first principles model, will be used as shown in Fig. 3 to build a hybrid model. A DNN is initialized by assigning some values for the number of layers, number of neurons in each layer, type of activation function, and the initial values of weights and biases. A DNN has more than one hidden layer apart from the input and output layers. There are many kinds of activation functions being used today but widely used are Sigmoid, Hyperbolic tangent, Rectified linear unit (ReLU), and Leaky rectified linear unit (Leaky ReLU). If the input and output to the DNN vary over an order of magnitude, then this will cause issues during the training process. To avoid such scenarios, it is better to scale or make them dimensionless. 4.1. Training algorithm We use an input-output training data set (u1 , y1 ), (u2 , y2 ), ....(uq , yq ), ....(uQ , yQ ) for the purpose of training the hybrid model which includes a DNN to approximate the unknown parameters p. The inputs u are presented to the hybrid model, specifically to the first principles model and the

10

DNN. The inputs to the DNN are propagated through its layers and the network’s outputs are obtained at the final layer. Eqs. (3), (6) are utilized to calculate the outputs of the DNN which are the predicted parameter values p. These predictions are used as inputs to the first principles model 0

to calculate the outputs of the hybrid model y . For the deep hybrid model, the squared prediction error of the output for all Q training patterns was minimized as: Q

1X Vˆ = (eq )T (eq ) 2 q=1

(16)

eq = yq − yq

(17)

0

The DNN’s output pq does not explicitly appear in the above error equation as it is generated and utilized internally in the hybrid model. In order to update the parameters of DNN using Eqs. (13), (14), the effect of the DNN’s output pq on the prediction error of the hybrid model eq needs to be quantified. For this purpose, we utilize finite difference methods to calculate the gradient of 0

the hybrid model’s output yq with respect to the DNN’s output pq . Hence, we obtain the following equations: ∂eq = −1 ∂yq0 0

0

0

(18) 0

∂yq yq+1 − 2yq + yq−1 = ∂pq pq+1 − pq−1 0

0

0

(19) 0

yq+1 − 2yq + yq−1 ∂eq ∂eq ∂yq = 0 =− ∂pq ∂yq ∂pq pq+1 − pq−1

(20)

Let’s define the sensitivity of the error eq to changes in the net input nkq (i) of unit i in layer k when input vector uq is given to the DNN as: δqk (i) =

∂eq ∂nkq (i)

(21)

Now, the above equation can be rewritten using Eq. (4) as: δqk (i) =

∂eq ∂eq ∂akq (i) ∂eq ˙k k = = k f (n (i)) k k k ∂nq (i) ∂aq (i) ∂nq (i) ∂aq (i)

(22)

where akq (i) is the output of unit i in layer k when input vector uq is given to the DNN. For the last layer M, the above equation leads to: δqM =

∂eq ˙ M M F (nq ) ∂AM q

11

(23)

But the output AM q from the final layer M is the predicted parameter pq which results in ∂eq ∂eq = ∂pq ∂AM q

(24)

Using Eqs. (20), (23), (24), the δqM value can be calculated. Recall, the Jacobian matrix contains the sensitivities of the error eq to changes in the parameters of the DNN, i.e. W k and B k . In mathematical form, these sensitivities can be represented as

∂eq ∂W k

and

∂eq , ∂B k

and can be calculated

for the neurons in the final layer M using δqM and Eq. (3) in the following way: ∂eq −1 = δqM AM q ∂W M

(25)

∂eq = δqM ∂B M

(26)

For other layers, k = 1, ..., M −1, δqk value can be calculated using the following recurrence relation: T δqk = F˙ k (nkq ) W k+1 δqk+1

(27)

Using δqk value, the Jacobian matrix can be calculated in a manner similar to when obtaining Eqs. (25), (26). Following the calculation of the Jacobian matrix, the parameters of the DNN can be updated using Eq. (14). The Levenberg-Marquardt training algorithm starts with an initial value for µ but it is multiplied by a factor β whenever an update would result in the increase of Vˆ . On the other hand, when an update reduces the Vˆ value, then µ is divided by β. The training algorithm is repeated until the value of Vˆ reaches a predefined tolerance. The proposed algorithm has been summarized in Algorithm 1. Also, the flow diagram of the proposed hybrid modeling framework is presented with details in Fig. 6. Algorithm 1 Deep hybrid model training 0

1: Present all the inputs uq to the hybrid model and calculate its corresponding output yq . Compute the

errors eq using Eq. (17) and the sum of the squared of errors over all inputs Vˆ using Eq. (16). 2: Compute the Jacobian matrix using Eqs. (13), (20), (21), (22), (23), (24), (25), (26), (27) 3: Update the parameters of the DNN using Eq. (14) and recalculate Vˆ using Eq. (16). If it is smaller than that computed in Step 1, then reduce µ by β, and proceed to Step 1 with the new parameters of the DNN. If the Vˆ is greater than that computed in Step 1, then increase µ by β and repeat this step. 4: The algorithm is terminated when Vˆ is less than a predetermined value.

12

Remark 1. To reiterate, the important novelty of this proposed work is the algorithm which utilizes data for DNN-based hybrid model training. Our proposed algorithm is based on the LevenbergMarquardt optimization method and it includes a customized step for Jacobian calculation. Other than Levenberg-Marquardt algorithm, steepest descent or Gauss-Newton method are some of the popular techniques used for neural network training. But these methods solely cannot be directly applied for training the proposed DNN-based hybrid model. Remark 2. The Levenberg-Marquardt algorithm has faster convergence rate compared to the steepest descent method, and is stable compared to the Gauss-Newton method. These properties make it a superior algorithm compared to the other two methods. But the Levenberg-Marquardt has the issue of the calculation of Jacobian matrix. If the size of the neural network or the size of the training data matrix is large, then the calculation of the Jacobian matrix and consequently, the update of the parameters of the DNN would be computationally expensive or the algorithm could face memory issues. Therefore, our Levenberg-Marquardt based hybrid model training algorithm is more suitable to systems that require small sizes of DNNs to capture their unknown parameters and small training data sets. Otherwise, the steepest descent or the Gauss-Newton method would be more suitable for hybrid model training. 5. Hydraulic fracturing process Shale gas is the natural gas trapped in rocks with low porosity and permeability which is difficult to extract using conventional oil/gas extraction techniques. Horizontal drilling and hydraulic fracturing process enabled the extraction of oil/gas from such rocks. In a hydraulic fracturing process, controlled explosions are carried out inside the wellbore to create initial fractures followed by pumping of a clean fluid called pad to extend the geometry of these fractures. Later, a fracturing fluid containing water, proppant and additives is pumped inside the system to further extend these fractures. During and after the pumping process, the fracturing fluid leaks off into the reservoir leaving behind proppant in the fractures. The natural stresses present in the reservoir cause the closure of these fractures with proppant inside them. The presence of proppant creates a conductive medium for the oil/gas to flow through these fractures making their extraction possible. The phe-

13

nomenon of interest in this work is the leak off of the fracturing fluid into the reservoir during the hydraulic fracturing process as its knowledge is essential to obtain the optimal fracture geometry. Obtaining the fracture geometry requires huge amounts of fracturing fluid, and hence, fluid leak off indirectly affects the economic efficiency of the hydraulic fracturing process. In our past work on modeling and control of hydraulic fracturing process, we utilized a mathematical expression of time-dependent fracturing fluid leak-off rate. In practice, advancement in pressure analysis has made it possible to estimate the leak-off rate from pressure decline following injection of fracturing fluid. But this method requires the knowledge of gross fracture height, and hence, this method is sutable for the formations with a large net permeable height. As previously described, in this work, we utilize a DNN to estimate the fluid leak-off rate from input-output data of the hydraulic fracturing process as explained in the previous section. To achieve this objective, we also utilize a first principles based model of hydraulic fracturing process which comprises of 3 subprocesses: a) Fracture propagation, b) Proppant transport, and c) Proppant bank formation. One can refer Yang et al. (2017), Siddhamshetty et al. (2017), and Siddhamshetty et al. (2018) to understand the model used in our application. The subprocesses of hydraulic fracturing process are briefly discussed in the following subsections. 5.1. Fracture propagation We assume the fracture propagation follows the Perkins, Kern and Nordgren (PKN) model (Perkins and Kern, 1961; Nordgren, 1972) which is shown in Fig. 4. The assumptions included are as follows: a) the fracture length is assumed to be greater than the fracture width, and hence, the fluid pressure in the vertical direction remains constant; b) the fracture is confined to a single layer because of large stresses in the rock layers above and below the fractures; c) the fracturing fluid is assumed to be incompressible, and the rock properties such as Young’s modulus and Poisson’s ratio are also assumed to be constant. Following the assumptions, the fracture geometry and fracture cross-sectional area take an elliptical and rectangular shape, respectively. Lubrication theory is utilized to explain the fluid momentum inside the fracture which

14

relates the fluid flow rate, qz , in the horizontal direction to the pressure gradient, − ∂P zˆ, as follows: ∂z qz = −

πHW 3 ∂P 64µ ∂z

(28)

where H is the fracture height, W is the fracture width, and µ is the fracturing fluid viscosity. The maximum width of the fracture can be calculated using the net pressure exerted by the fracturing fluids as follows: W =

2P H(1 − ν 2 ) E

(29)

where ν is the Poisson’s ratio, and E is the Young’s modulus of the rock formation. The local mass conservation of an incompressible fluid leads to the continuity equation which is: ∂A ∂qz + + HU = 0 ∂t ∂z

(30)

where A = πW H/4 is the cross-sectional area of the fracture, U is the fluid leak off rate per unit height, t is the time elapsed since the beginning of pumping, and z is the spatial coordinate in the horizontal direction. Plugging Eqs. (28), (29) into Eq. (30) results in the following partial differential equation of W : " #  ∂W 2 2 πE ∂ W πH ∂W − 3W 2 + W 3 2 + HU = 0 4 ∂t 128µ(1 − ν 2 ) ∂z ∂z

(31)

The two boundary conditions and an initial condition for the process are shown below (Gu and Hoo, 2015): qz (0, t) = Q0

W (L(t), t) = 0,

W (z, 0) = 0

(32) (33)

where Q0 is the fracturing fluid rate at the wellbore, and L(t) is the length of the fracture varying with time. 5.2. Proppant transport In this model, it is assumed that the proppant moves at a rate equal to the superficial velocity of the fracturing fluid in the horizontal direction, and with the settling velocity relative to the fracturing fluid in the vertical direction. The other assumptions utilized are: (1) the size of proppant particle

15

is large enough that its diffusive flux is neglected and only its convective flux is considered; (2) the proppant particle-particle interactions are neglected while only drag and gravity effects are considered; and (3) the uniform size of proppant particles. Utilizing the assumptions in regard to proppant transport, the advection of proppant can be defined in the following manner: ∂(W C) ∂ + (W CVp ) = 0 ∂t ∂z

(34)

C(0, t) = C0 (t) and C(z, 0) = 0

(35)

where C(z, t) is the proppant concentration at distance z in the horizontal direction from the wellbore at time t, and C0 (t) is the proppant concentration at the wellbore varied with time. Vp is the net velocity of proppant and is calculated using the following equation (Adachi et al., 2007): Vp = V − (1 − C)Vs

(36)

where V is the superficial fluid velocity, and Vs is the gravitational settling velocity which is calculated using the following equation (Daneshy, 1978): Vs =

(1 − C)2 (ρsd − ρf )gd2 101.82C 18µ

(37)

where ρf is the pure fluid density, ρsd is the proppant particle density, g is the gravitational constant, d is the proppant particle diameter, and µ is the fracture fluid viscosity which is dependent on the proppant concentration C (Barree and Conway, 1995): µ(C) = µ0

C 1− Cmax

!−α

(38)

where µ0 is the pure fluid viscosity, α is an exponent in the range of 1.2 to 1.8, and Cmax is the maximum theoretical concentration calculated using the equation Cmax = (1 − φ)ρsd where φ is the proppant bank porosity. 5.3. Proppant bank formation When proppant settles down in a fracture, a proppant bank is formed whose height δ can be calculated using the following equations (Gu and Hoo, 2014; Novotny, 1977): d(δW ) CVs W = dt (1 − φ)

16

(39)

δ(z, 0) = 0

(40)

where Eq. (40) is the initial condition for Eq. (39). Remark 3. Considering the complexity of the hydraulic fracturing process because of its moving boundary nature and its first principles model involving parabolic partial differential equations along with other system equation, one can opt for its simpler reduced-order models to explain its dynamics. In our past work, we have developed multiple reduced-order modeling techniques and applied it to hydraulic fracturing process (Narasingam et al., 2017; Narasingam and Kwon, 2017; Narasingam et al., 2018; Sidhu et al., 2018; Narasingam and Kwon, 2018; Bangi et al., 2019). 6. Deep hybrid model for hydraulic fracturing process Recall, the deep hybrid model structure as shown in Fig. 3 is utilized to predict the unknown parameters of the first principles model. We utilized the first principles model of the hydraulic fracturing process discussed above along with a DNN to build a hybrid model. The objective of the DNN, once it is trained, is to predict the leak off rate U of the fracturing fluid into the reservoir by using time t as input. The input to the hybrid model is the fracturing fluid injection rate Q0 (t) and the output of the hybrid model is the width of the fracture Wz at various points z = 1, 2, 3......251 along the length of the fracture in the horizontal direction z. The schematic of the deep hybrid model for hydraulic fracturing process is shown in Fig. 5. Unlike the schematic of hybrid model shown in Fig. 3, wherein both the state and the input applied on the system are used as inputs to the DNN, we only utilize the time t as input to the DNN as seen in Fig. 5. For the DNN, we considered a fully connected network consisting of 5 layers, i.e. 3 hidden layers, 1 input layer, and 1 output layer. Each hidden layer contains 20 neurons, and the input and output layer contain 1 neuron each. Hyperbolic tangent was used as the activation function in the hidden layers and the linear function in the outer layers. To generate training data in order to train the hybrid model, we simulated the first principles model using a constant input flow rate of Q0 (t) = 0.03 m3 /s and the leak off model, as shown in Eq. (41) (Howard and Fast, 1957; Economides and Nolte, 2000), with the assumption that the leak off rate U is independent of spatial location z. The other parameters of the first principles model used in our process calculations are

17

as follows (Gu and Hoo, 2014): H = 20 m, µ = 0.56 Pa · s, E = 5 × 103 MPa, and ν = 0.2. The output from the simulation was the widths Wz at various points z = 1, 2, 3......251 along the length of the fracture. These 251 points are spaced equidistantly with a distance of 0.2 m between two consecutive spatial points and we collected 2841 time snapshots of the output data with a time difference of 0.1s between two consecutive snapshots. Since we have used the model in Eq. (41) for leak off rate when generating the simulation data assuming τ (z) to be constant, we expect the DNN in the hybrid model to capture a similar variation of the leak off rate U . 2Cleak U=p t − τ (z)

(41)

Now that the hybrid model structure and its components have been defined, and the training data is available, we initialized the parameters of DNN (i.e., weights W and biases B), and began the Levenberg-Marquadt based training algorithm as discussed above in Algorithm 1. The inputs Q0 (t) were presented to the hybrid model with the DNN utilizing t as its input and the first principles model utilizing Q0 (t) as its input. The input t to the DNN undergoes transformation through its layers and gives the output U (t) which is presented to the first principles model as an input. 0

The first principles model utilizes Q0 (t) along with U (t) to calculate the outputs Wz (t). Since the parameters of DNN were randomly initialized, the initial predictions of U(t) by the DNN is more likely inaccurate than its actual value which in turn will affect the error e(t) calculated using Eq. (17). As the algorithm progresses, the parameters of the DNN are updated, and hence, its predictions of U (t) will move closer to its actual value. A point to note here is that despite the hybrid model predicting widths at 251 locations, we only considered just the width at the wellbore, 0

i.e. W1 (t) in the objective function calculation Eq. (16), and hence in error e(t) calculation Eq. (14) as well. This is because we assumed earlier that U does not vary with spatial location, and hence, 0

the U (t) approximated by the DNN is applicable at all locations which implies that when the W1 (t) values predicted by the hybrid model move towards convergence by updating the parameters of the 0

DNN, the widths at other locations i.e., Wz where z = 2, 3......251 also move towards convergence. In order to reduce the error and reach the tolerance of 10−6 for Vˆ , we needed to update the parameters of DNN, i.e. its weights W and biases B using Eq. (14). To do so, we calculate the

18

Jacobian matrix as explained previously which includes initializing the value of µ. We used an initial value of µ as 10−2 and β as 2, and updated the weights. Using the updated weights, we recalculated Vˆ and continued with the training process as explained in Algorithm 1. Once the tolerance for Vˆ was reached, we stopped the training process. We compared the outputs (i.e., Wz of the hybrid model and the actual outputs from the training data) at the wellbore, and this comparison is presented in Fig. 7. It can be seen from Fig. 7 that the outputs obtained using the proposed deep hybrid model closely mimic the output from the training data. This indicates that the DNN has been well trained and the hybrid model accurately predicts the outputs Wz . We used a relative error metric, RE(t), to quantify the performance of the hybrid model in comparison to the training data. The relative error is calculated by using the Frobenius norms of the state vectors as follows: RE(t) =

kWtraining (t) − Whybrid (t)kf ro kWtraining (t)kf ro

(42)

where k.kf ro is the Frobenius norm, Wtraining (t) and Whybrid (t) are the width vectors obtained from the training data and the hybrid model, respectively, at time t. The relative error in the widths predicted by the hybrid model in comparison with the training data at different times is presented in Fig. 8. In order to compare the performance of the DNN in approximating the underlying phenomena of leak off, we used the obtained DNN and computed the leak off rate U values by presenting time inputs t. The predicted U values are compared against the actual leak off rate values obtained using Eq. (41), and the comparison is shown in Fig. 9. From Fig. 8, we note that the DNN has been well trained and was able to accurately approximate the underlying leak off rate in the training data. A point to observe here is that there is slight inaccuracy in the approximation of the leak off rate U by the DNN in the initial stages of pumping whose effect can also be seen in the larger relative errors as shown in Fig. 8. Remark 4. In general, the performance of the DNN is affected by hyper-parameters such as the number of layers and nodes in its structure. In our work we utilized trial and error method to come up with the final structure for the DNN in the hybrid model. The use of any sophisticated hyperparameter tuning methods is advised as it will quicken the selection of the optimal hyper-parameters

19

for the DNN. 6.1. Comparison of Deep hybrid model and black box model The deep hybrid model utilizes a DNN to approximate the unobserved process parameters. As discussed previously, the structure of the hybrid model is similar to the first principles model except the DNN which does not alter the nature and characteristics of the known parameters in the first principles model. The presence of the first principles model in the hybrid model enables it with better extrapolation properties. On the other hand, the parameters of a black box model, which are purely data-driven, do not carry any physical meaning and such models have narrow applicability. To quantitatively prove this point, we built a black box model using a DNN. As the structure of the DNN greatly affects not only its training performance but also the extrapolation properties of the black box model, it is necessary to select an optimal structure in terms of the number of layers and nodes. For this purpose we selected 9 DNNs with network characteristics that had two to four hidden layers, and 5, 10 and 20 neurons per hidden layer. Each DNN was trained using the same training data as used in the case of hybrid model. Next the performance of the DNNs was assessed using three different test inputs, i.e. Q0 = 0.02, 0.04, 0.05, and these inputs were never seen by the DNNs. The performances were quantified using the relative error as defined in Eq. (42), the average relative error (ARE) was calculated for each scenario and the values have been tabulated below. Based on the data presented in Table 2, the DNN with 4 layers and 20 neurons in each hidden layer Network Structure 4 layer, 5 neurons 4 layer, 10 neurons 4 layer, 20 neurons 5 layer, 5 neurons 5 layer, 10 neurons 5 layer, 20 neurons 6 layer, 5 neurons 6 layer, 10 neurons 6 layer, 20 neurons

Training ARE Q0 = 0.03 0.0108 0.0037 0.0019 0.0036 0.0033 0.0047 0.0048 0.0042 0.0015

Test ARE Q0 = 0.02 0.1091 0.1097 0.1097 0.1100 0.1095 0.1093 0.1093 0.1095 0.1098

Test ARE Q0 = 0.04 0.0716 0.0711 0.0709 0.0706 0.0710 0.0717 0.0712 0.0710 0.0708

Test ARE Q0 = 0.05 0.1229 0.1225 0.1224 0.1222 0.1225 0.1232 0.1227 0.1226 0.1223

Table 2: Average relative errors for different network structures

was selected as the black box model. A point to be noted here is that the network structure with 6

20

layers and and 20 neurons in each hidden layer can also be selected but this structure utilizes more resources in terms of the number of hidden layers and neurons but only slightly improves the black box model performance. Once the structure with 4 layers and 20 neurons in each hidden layer was finalized, we compared the training result of the black box model with the actual output from the training data. The comparison of the widths at the wellbore is shown in Fig. 10. It can be seen from Fig. 10 that the outputs obtained using the black box closely mimics the output from the training data. This indicates that the DNN has been well trained and the black box model accurately predicts the width when the training input Q0 = 0.03 m3 /s is presented to it. In contrast to the hybrid model, the parameters of the black box model (i.e., the DNN parameters) do not carry any physical meaning and hence, cannot give any insights about the process. To compare the extrapolation properties of the deep hybrid model and the black box model, we presented to them another input of Q0 = 0.04 m3 /s, which is a small deviation from the training input. Fig. 11 shows the comparison of the widths Wz from the hybrid model, black box model and the true data at the wellbore. Although the DNN-based black box model captures the trend in the outputs but clearly has poor accuracy, whereas the hybrid model accurately predicts the output when presented with the test input. An important point to note here is that the difference between the training input (Q0 = 0.03 m3 /s) and the test input (Q0 = 0.04 m3 /s) is small; however, the performance of the black box model varies considerably in both these cases as shown in Fig. 11. As explained previously, this inaccurately stems from the fact that the black box model can accurately approximate the relationship between the training inputs and the training outputs by utilizing various parameters which do not carry any physical significance. Hence, when a test input different from the training input is presented to it, the black-box model fails to replicate the accuracy seen in the case of training data. On the other hand, since the hybrid model retains the structure of the first principles model and its parameters carry physical meaning, it is able to accurately predict the output in the case of test input. 6.2. Comparison of Deep hybrid model and first principles model The main objective of the deep hybrid model is to develop a DNN that enhances the performance of the first principles model by accurately predicting its unknown parameters. In this work, we

21

developed a DNN to extract the parameter associated with the leak-off rate from the training data. In order to compare the performance of the hybrid model with only the first principles model, we assume a constant value for the leak off rate in the first principles model, and we do not make any changes to the deep hybrid model developed above. The three different cases of the first principles model were considered with three different values (U = 1.2 × 10−5 , 1.2 × 10−6 , 6.3 × 10−6 ) for the leak off rate. The comparison between the outputs at the wellbore is shown in the figure below. From Fig. 12, it can be seen that the hybrid model accurately predicts, as expected, when compared to the first principles model. This superior performance is attributed to the accurate prediction of the parameter associated with the leak-off rate by the DNN in the hybrid model. On the other hand, the first principles model with (U = 1.2 × 10−5 ) performed on par with the hybrid model but the process described by it takes longer time to reach the desired fracture length, and the first principles models with (U = 1.2 × 10−6 and 6.3 × 10−6 ) predicted poorly both in terms of accuracy as well as the time to reach the desired fracture length. The above comparison proves the superiority of hybrid model over first principles model in terms of accuracy as the hybrid model contains a DNN to accurately predict the leak off rate U values. A point to be noted here is that in all of the above results we utilized simulation data. In our future work, we plan to apply our proposed methodology for field/experimental data. 7. Conclusions In this work we proposed a deep neural network-first principles-based hybrid modeling strategy wherein the purpose of the deep neural network is to predict the unknown process parameters in the hydraulic fracturing process. We utilized simulation data for training the hybrid model, specifically, its deep neural network. Additionally, we utilized the Levenberg-Marquardt algorithm and finite difference-based sensitivity analysis to update the parameters of the deep neural network during the training process. Once trained, we utilized the hybrid model to prove its superior extrapolation properties compared to a purely data-driven deep neural network model. Also, we showed its superior accuracy compared to the first principles model as it utilizes a trained deep neural network to accurately predict the initially unknown process parameters.

22

8. Acknowledgment The authors gratefully acknowledge financial support from the National Science Foundation (CBET-1804407), the Department of Energy (DE-EE0007888-10-8), the Texas A&M Energy Institute, and the Artie McFerrin Department of Chemical Engineering.

23

Literature Cited Adachi, J., Siebrits, E., Pierce, A., Desroches, J., 2007. Computer simulation of hydraulic fractures. Int. J. Rock Mech. Min. Sci. 44, 739–757. Aguiar, H.C., Filho, R.M., 2001. Neural network and hybrid model: a discussion about different modeling techniques to predict pulping degree with industrial data. Chemical Engineering Science 56, 565 – 570. Arahal, M.R., Cirre, C.M., Berenguel, M., 2008. Serial grey-box model of a stratified thermal tank for hierarchical control of a solar plant. Solar Energy 82, 441 – 451. Bangi, M.S.F., Narasingam, A., Siddhamshetty, P., Kwon, J.S., 2019. Enlarging the domain of attraction of the local dynamic mode decomposition with control technique: Application to hydraulic fracturing. Ind. Eng. Chem. Res. 58, 5588–5601. Barree, R., Conway, M., 1995. Experimental and numerical modeling of convective proppant transport. J. Pet. Technol. 47, 216–222. Bohlin, T., Graebe, S.F., 1995. Issues in nonlinear stochastic grey box identification. International Journal of Adaptive Control and Signal Processing 9, 465–490. Cameron, I.T., Hangos, K., 2001. Process Modelling and Model Analysis. Academic Press. Carinhas, N., Bernal, V., Teixeira, A.P., Carrondo, M.J.T., Alves, P.M., Oliveira, R., 2011. Hybrid metabolic flux analysis: combining stoichiometric and statistical constraints to model the formation of complex recombinant products. BMC Systems Biology 5. Cubillos, F.A., Acu˜ na, G., 2007. Adaptive control using a grey box neural model: An experimental application. In Advances in Neural Networks - ISNN 2007 , 311–318. Daneshy, A., 1978. Numerical solution of sand transport in hydraulic fracturing. J. Pet. Technol. 30, 132–140.

24

Delalleau, O., Bengio, Y., 2011. Shallow vs. deep sum-product networks. In Advances in Neural Information Processing Systems 24, 666–674. Dors, M., Simutis, R., Lbbert, A., 1995. Advanced supervision of mammalian cell cultures using hybrid process models. In Computer Applications in Biotechnology , 72 – 77. Economides, M.J., Nolte, K.G., 2000. Reservoir stimulation. Wiley, Chichester. Eldan, R., Shamir, O., 2016. The power of depth for feedforward neural networks. In 29th Annual Conference on Learning Theory 49, 907–940. Eslamloueyan, R., Setoodeh, P., 2011. Optimization of fed-batch recombinant yeast fermentation for ethanol production using a reduced dynamic flux balance model based on artificial neural networks. Chemical Engineering Communications 198, 1309–1338. Fiedler, B., Schuppert, A., 2008. Local identification of scalar hybrid models with tree structure. IMA Journal of Applied Mathematics 73, 449 – 476. Fu, P.C., Barford, J., 1995. Integration of mathematical modelling and knowledge-based systems for simulations of biochemical processes. Expert Systems with Applications 9, 295 – 307. Fu, P.C., Barford, J., 1996. A hybrid neural network-first principles approach for modelling of cell metabolism. Comput. Chem. Eng 20, 951 – 958. Georgieva, P., de Azevedo, S., 2009. Computational intelligence techniques for bioprocess modelling, supervision and control. volume 218. Springer. Ghosh, D., Hermonat, E., Mhaskar, P., Snowling, S., Goel, R., 2019. Hybrid modeling approach integrating first-principles models with subspace identification. Ind. Eng. Chem. Res. 58, 13533– 13543. Gnoth, S., Jenzsch, M., Simutis, R., L¨ ubbert, A., 2008. Product formation kinetics in genetically modified e. coli bacteria: inclusion body formation. Bioprocess and Biosystems Engineering 31, 41–46.

25

Gu, Q., Hoo, K.A., 2014. Evaluating the performance of a fracturing treatment design. Ind. Eng. Chem. Res. 53, 10491–10503. Gu, Q., Hoo, K.A., 2015. Model-based closed-loop control of the hydraulic fracturing process. Ind. Eng. Chem. Res. 54, 1585–1594. Gupta, S., Liu, P.H., Svoronos, S.A., Sharma, R., Abdek-Khalek, N.A., Cheng, Y., El-Shall, H., 1999. Hybrid first-principles/neural networks model for column flotation. AIChE J. 45, 557–566. Howard, G.C., Fast, C.R., 1957. Optimum fluid characteristics for fracture extension. Drill. prod. pract. 24, 261–270. Hu, G., Mao, Z., He, D., Yang, F., 2011. Hybrid modeling for the prediction of leaching rate in leaching process based on negative correlation learning bagging ensemble algorithm. Comput. Chem. Eng. 35, 2611 – 2617. Jia, R., Mao, Z., Chang, Y., Zhao, L., 2011. Soft-sensor for copper extraction process in cobalt hydrometallurgy based on adaptive hybrid model. Chemical Engineering Research and Design 89, 722 – 728. Johansen, T.A., Foss, B.A., 1992. Representing and learning unmodeled dynamics with neural network memories. In 1992 American Control Conference , 3037–3043. Jorgensen, S.B., Hangos, K.M., 1995. Grey box modelling for control: Qualitative models as a unifying framework. International Journal of Adaptive Control and Signal Processing 9, 547–562. Kramer, M.A., Thompson, M.L., Bhagat, P.M., 1992. Embedding theoretical models in neural networks. In 1992 American Control Conference , 475–479. Kumar Akkisetty, P., Lee, U., Reklaitis, G.V., Venkatasubramanian, V., 2010. Population balance model-based hybrid neural network for a pharmaceutical milling process. Journal of Pharmaceutical Innovation 5, 161–168.

26

Lauret, P., Boyer, H., Gatina, J., 2000. Hybrid modelling of a sugar boiling process. Control Engineering Practice 8, 299 – 310. Levenberg, K., 1944. A method for the solution of certain non-linear problems in least squares. Quarterly of Applied Mathematics 2, 164–168. Liang, S., Srikant, R., 2017. Why deep neural networks for function approximation?

In 5th

International Conference on Learning Representations . Mahalec, V., Sanchez, Y., 2012. Inferential monitoring and optimization of crude separation units via hybrid models. Comput. Chem. Eng. 45, 15 – 26. Marquardt, D., 1963. An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Applied Mathematics 11, 431–441. Molga, E., Cherba´ nski, R., 1999. Hybrid first-principle-neural-network approach to modelling of the liquid-liquid reacting system. Chemical Engineering Science 54, 2467 – 2473. Montufar, G.F., Pascanu, R., Cho, K., Bengio, Y., 2014. On the number of linear regions of deep neural networks. In Advances in Neural Information Processing Systems 27, 2924–2932. Narasingam, A., Kwon, J.S., 2017. Development of local dynamic mode decomposition with control: Application to model predictive control of hydraulic fracturing. Comput. Chem. Eng. 106, 501– 511. Narasingam, A., Kwon, J.S.I., 2018. Data-driven identification of interpretable reduced-order models using sparse regression. Comput. Chem. Eng. 119, 101 – 111. Narasingam, A., Siddhamshetty, P., Kwon, J.S., 2017. Temporal clustering for order reduction of nonlinear parabolic PDE systems with time-dependent spatial domains: Application to a hydraulic fracturing process. AIChE J. 63, 3818–3831. Narasingam, A., Siddhamshetty, P., Kwon, J.S., 2018. Handling spatial heterogeneity in reservoir

27

parameters using proper orthogonal decomposition based ensemble kalman filter for model-based feedback control of hydraulic fracturing. Ind. Eng. Chem. Res. 57, 39773989. Nascimento, C.A.O., Giudici, R., Scherbakoff, N., 1999. Modeling of industrial nylon-6,6 polymerization process in a twin-screw extruder reactor. ii. neural networks and hybrid models. Journal of Applied Polymer Science 72, 905–912. Nordgren, R., 1972. Propagation of a vertical hydraulic fracture. Soc. Petrol. Eng. J. 12, 306–314. Novotny, E.J., 1977. Proppant transport. In Proceedings of the 52nd SPE annual technical conference and exhibition, Denver, CO. (SPE 6813). Osborne, M.R., 1992. Fisher’s method of scoring. International Statistic Review 86, 271–286. Perkins, T.K., Kern, L.R., 1961. Widths of hydraulic fractures. J. Pet. Technol. 13, 937–949. Preusting, H., Noordover, J., Simutis, R., L¨ ubbert, A., 1996. The use of hybrid modelling for the optimization of the penicillin fermentation process. CHIMIA International Journal for Chemistry 50, 416–417. Psichogios, D.C., Ungar, L.H., 1992. A hybrid neural network-first principles approach to process modeling. AIChE. J. 38, 1499–1511. Qi, H., Zhou, X.G., Liu, L.H., Yuan, W.K., 1999. A hybrid neural network-first principles model for fixed-bed reactor. Chemical Engineering Science 54, 2521 – 2526. Reuter, M., Deventer, J.V., Walt, T.V.D., 1993. A generalized neural-net kinetic rate equation. Chemical Engineering Science 48, 1281 – 1297. Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learning representations by back-propagating errors. Nature 323, 533–536. Safavi, A., Nooraii, A., Romagnoli, J., 1999. A hybrid model formulation for a distillation column and the on-line optimisation study. Journal of Process Control 9, 125 – 134.

28

Schubert, J., Simutis, R., Dors, M., Havlik, I., Luebbert, A., 1994a. Hybrid modelling of yeast production processes combination of a priori knowledge on different levels of sophistication. Chemical Engineering & Technology 17, 10–20. Schubert, J., Simutis, R., Dors, M., Havlik, I., Lbbert, A., 1994b. Bioprocess optimization and control: Application of hybrid modelling. Journal of Biotechnology 35, 51 – 68. Siddhamshetty, P., Kwon, J.S., Liu, S., Valko, P.P., 2017. Feedback control of proppant bank heights during hydraulic fracturing for enhanced productivity in shale formations. AIChE J. 64, 1638–1650. Siddhamshetty, P., Yang, S., Kwon, J.S.I., 2018. Modeling of hydraulic fracturing and designing of online pumping schedules to achieve uniform proppant concentration in conventional oil reservoirs. Comput. Chem. Eng. 114, 306 – 317. Sidhu, H.S., Narasingam, A., Siddhamshetty, P., Kwon, J.S., 2018. Model order reduction of nonlinear parabolic pde systems with moving boundaries using sparse proper orthogonal decomposition: application to hydraulic fracturing. Comput. Chem. Eng. 112, 92–100. Simutis, R., Lbbert, A., 1997. Exploratory analysis of bioprocesses using artificial neural network based methods. AIChE J. 13, 479–487. Stinchcombe, M., White, H., 1989. Universal approximation using feedforward networks with nonsigmoid hidden layer activation functions, pp. 613–617. von Stosch, M., Oliveira, R., Peres, J., de Azevedo, S.F., 2014. Hybrid semi-parametric modeling in process systems engineering: Past, present and future. Comput. Chem. Eng. 60, 86 – 101. Su, H., Bhat, N., Minderman, P.A., McAvoy, T.J., 1993. Integrating neural networks with first principles models for dynamic modeling. In IFAC Symposium on Dynamics and Control of Chemical Reactors, Distillation Columns and Batch Processes , 327–332.

29

Teixeira, A.P., Alves, C., Alves, P.M., Corrondo, M.J.T., Oliveira, R., 2007. Hybrid elementary flux analysis/nonparametric modeling: application for bioprocess control. BMC Bioinformatics 8. Thompson, M.L., Kramer, M.A., 1994. Modeling chemical processes using prior knowledge and neural networks. AIChE. J. 40, 1328–1340. Tsay, C., Baldea, M., 2019. 110th anniversary: Using data to bridge the time and length scales of process systems. Ind. Eng. Chem. Res. 58, 16696–16708. Tsen, A.Y.D., Jang, S.S., Wong, D.S.H., Joseph, B., 1996. Predictive control of quality in batch polymerization using hybrid ann models. AIChE J. 45, 455–465. Tulleken, H.J.A.F., 1993. Grey-box modelling and identification using physical knowledge and bayesian techniques. Automatica 29, 285 – 308. Venkatasubramanian, V., 2018. The promise of artificial intelligence in chemical engineering: Is it here, finally? AIChE J. 65, 466–478. Wang, X., Chen, J., Liu, C., Pan, F., 2010. Hybrid modeling of penicillin fermentation process based on least square support vector machine. Chemical Engineering Research and Design 88, 415 – 420. Werbos, P.J., 1988. Backpropagation: past and future. Proceedings of the Second International Conference on Neural Network 1, 343–353. Yang, S., Siddhamshetty, P., Kwon, J.S., 2017. Optimal pumping schedule design to achieve a uniform proppant concentration level in hydraulic fracturing. Comput. Chem. Eng. 101, 138– 147. Zahedi, G., Lohi, A., Mahdi, K., 2011. Hybrid modeling of ethylene to ethylene oxide heterogeneous reactor. Fuel Processing Technology 92, 1725 – 1732.

30

Neural Network xk , uk

p First Principles Model

Figure 1: Hybrid neural network model.

31

xk+1

Input layer

Hidden layers

Figure 2: Deep neural networks.

32

Output layer

Deep Neural Network xk, uk

p First Principles Model

Figure 3: Proposed deep hybrid model.

33

yk

Figure 4: The PKN fracture model.

34

t

Deep Neural Network

Q0(t)

U(t) First Principles Model-Hydraulic Q0(t) Fracturing

Wz(t)

Figure 5: Schematic of deep hybrid model for hydraulic fracturing process

35

Figure 6: Block diagram for Levenberg-Marquardt based deep hybrid model training.

36

Figure 7: Comparison of wellbore widths obtained from the hybrid model, and the training data.

37

Figure 8: Relative error of the hybrid model predictions in comparison to the training data.

38

Figure 9: Comparison of leak off rates predicted from the DNN and actual values calculated using Eq. (41).

39

Figure 10: Comparison of wellbore widths obtained from the DNN-based black box model, and the training data.

40

Figure 11: Comparison of wellbore widths obtained from the hybrid model, black box model and the test data.

41

Figure 12: Comparison of wellbore widths obtained from the hybrid model, first principles model and the actual data.

42

Mohammed Saad Faizan Bangi: Conceptualization, Methodology, Validation, Formal Analysis, Investigation, Writing –Original Draft, Writing – Review & Editing, Visualization, Joseph Kwon: Conceptualization, Formal Analysis, Resources, Writing – Review & Editing, Supervision, Project Administration, Funding Acquisition

Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: