A machine learning approach to filtrate loss determination and test automation for drilling and completion fluids

A machine learning approach to filtrate loss determination and test automation for drilling and completion fluids

Journal Pre-proof A machine learning approach to filtrate loss determination and test automation for drilling and completion fluids Sercan Gul, Eric v...

3MB Sizes 0 Downloads 45 Views

Journal Pre-proof A machine learning approach to filtrate loss determination and test automation for drilling and completion fluids Sercan Gul, Eric van Oort PII:

S0920-4105(19)31147-7

DOI:

https://doi.org/10.1016/j.petrol.2019.106727

Reference:

PETROL 106727

To appear in:

Journal of Petroleum Science and Engineering

Received Date: 27 February 2019 Revised Date:

17 November 2019

Accepted Date: 19 November 2019

Please cite this article as: Gul, S., van Oort, E., A machine learning approach to filtrate loss determination and test automation for drilling and completion fluids, Journal of Petroleum Science and Engineering (2019), doi: https://doi.org/10.1016/j.petrol.2019.106727. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.

1

A Machine Learning Approach to Filtrate Loss Determination and Test Automation for Drilling and Completion Fluids Sercan Gul and Eric van Oort Hildebrand Department of Petroleum and Geosystems Engineering, The University of Texas at Austin, 200 E. Dean Keeton St, Austin, TX 78712, USA

Abstract Drilling fluid property characterization currently involves several manually executed analytical tests, conducted in accordance with American Petroleum Institute (API) recommended practices 13B-1 and 13B-2. Standard (API) and high-pressure, high-temperature (HPHT) filter press units are used for filtrate loss (FL) measurements. However, these test methods have certain important disadvantages. FL tests are conducted at standardized conditions that generally do not reflect downhole oil and gas well conditions in terms of downhole pressure, temperature and filter medium encountered, present safety concerns due to their elevated pressure and temperature, and are performed only infrequently by a human mud engineer. The human measurement also introduces concerns around inaccuracies due to inconsistent practices and interpretation bias of the results. In addition, FL measurements are hard to automate given their manual, human-centric operating tasks. In this paper, we investigate if it is really necessary to automate FL measurements, or if the (strictly qualitative) information they provide can be obtained in a smarter, more advanced way. For this purpose, the relationship between fluid properties was investigated in detail using machine learning and deep learning techniques. Random forest (RF), XGBoost (XGB), support vector machine (SVM), multilayer perceptron (MLP) and multi-linear regression models were trained and tested to predict API and HPHT FL of water-based muds (WBM) based on fluid property readings of rheology, density, and temperature. A similar approach was also used for HPHT filtrate loss prediction of oil-based muds (OBM), taking into account their electrical stability and water content. A key advantage of this approach is that these WBM and OBM fluid properties can be obtained in real-time with measurements that are relatively simple and easy to automate (e.g. obtaining fluid density automatically and continuously from a Coriolis mass-flow meter measurement). Thus, real-time assessment of API and HPHT FL becomes possible without ever having to actually carry out any filter press measurements, thereby also eliminating the need to directly automate these measurements. The models were verified by dedicated laboratory experiments. The developed models estimated API and HPHT FL of WBM, and HPHT FL of OBM with mean absolute errors (MAE) of 0.56 ml/30min,1.15 ml/30min and 0.79 ml/30min respectively, well within the measurement accuracy of the observations by a human mud engineer. Keywords: machine learning, drilling fluid automation, filtrate loss determination.

1. Introduction The characterization of the filtration of water-based muds (WBM) and oil-based / synthetic-based muds (OBM/SBM) is important for optimum drilling fluid application. There are several issues and problems related to uncontrolled filtration in drilling fluids, such as induced seepage losses on permeable formations (possibly with associated formation damage and impairment of hydrocarbon production when it involves reservoir formations, see e.g. Jiao and Sharma (1992) and Liu and Civan (1996)), increased differential sticking tendency, increased frictional pressure losses and high associated equivalent circulating densities (ECD) in the circulating system, etc. Optimal mud maintenance and treatment, facilitated by accurate and frequent measurements, has historically been a crucial enabler for optimized drilling performance.

2

The conventional way to characterize fluid filtration behavior in the field is using low- and highpressure FL measurements, both specified in API recommended practices 13B-1 and 13B-2. A summary of FL test procedure according to API recommended practices is presented in Appendix A. These tests are carried out every day on every drilling rig around the world by numerous dedicated mud engineers, and most drilling practitioners are very familiar with them. Often, there are regulatory requirements associated with the reporting of FL numbers on the daily mud report. What is perhaps less well-known is that these tests have been around in virtually unaltered form for a very long time. Low-pressure and highpressure filtration tests are already mentioned in the 1st edition of Walter F. Rogers’ classic work on Composition and Properties of Oil Well Drilling Fluids (Rogers, 1948). We are therefore talking about equipment and test methods that are now well over 70 years old. This is certainly a testament to the effectiveness of the API standards and the tests themselves in providing useful and practical field guidance. On the other hand, it also indicates that a new approach to filtration characterization may be long overdue. Despite their evident success, there are various downsides to current FL testing methods. For instance, filter press measurements present the following issues and problems:



Differential pressure and temperature used during FL testing are standardized and are generally not representative of downhole conditions (if, by sheer coincidence, the exact test conditions are encountered downhole, then this will apply to only a very limited location in the well where differential pressure and temperature happen to be matching);



The filter medium used (i.e. filter paper or porous disks, e.g. made from aluminum oxide (“aloxite”)) is generally not representative of the porosity and permeability of downhole geological formations being intersected by the drillbit. It is, for instance, quite possible to get a good result on the filtration test with significant downhole losses occurring, because the filter medium in the test is of lower porosity and permeability and with a different pore fluid than downhole formations. Moreover, the FL test is only reflective of filtrate losses and not of whole mud losses to e.g. formation fractures;



There are safety issues associated with lab testing at elevated pressure and temperature, with potential hazards for the human operator / mud engineer;



Since the test is done manually by individuals, there will be interpretive bias issues associated with test outcome determination and the potential for deliberate manipulation of test outcomes;



Last but not least, FL tests create automation difficulties. Various parties (Saasen et al., 2008; MacPherson et al., 2013) have reported on automation of FL tests, with attempts to mechanize the labor-intensive human tasks associated with the tests. This has created “dishwashing robots”, i.e. mechanized equipment that attempts to replicate and mimic human tasks without consideration of a more effective alternative way to accomplish a particular set of tasks (in case of dishwashing, the latter is of course best accomplished by an actual modern dishwasher, a “parallel processor” that is far more efficient at washing dishes than a human). FL testing is cumbersome by the many intricate tasks that need to be accomplished while following the API protocols: changing out and mounting the filter medium, loading a homogenized fluid sample, applying pressure and temperature, collecting filtrate / condensate and, in particular, cleaning the equipment afterwards to make it ready for the next round of testing. It is our opinion that automating antiquated API test protocols is not a particularly useful practice and will prove to be a dead end in a more automated future.

A superior approach in our view would be to take a “clean-slate” approach and investigate if filtration information could be provided in a novel and meaningful way, using new methods that deviate

3

from conventional API recommended practices (but could become part of future recommended practices – note that the API practices were never intended to be cast in stone and everlasting). In this paper, we specifically investigate the use of machine learning and data analytics methods for this purpose in combination with sensors and measurements that lend themselves better to automation.

2. Theory In the remainder of this paper, we will argue that meaningful API and HPHT FL information can be obtained from mud properties such as density, viscosity, etc. The following paragraphs show that there is actually a sound physical basis for correlating FL against such mud parameters based on previous work, and succinctly summarize some of the previous model- and data-driven approaches taken to predicting FL. 2.1. Mud Filtration The flow of a fluid through the mud cake can be described by a simplified form of Darcy’s relationship (see e.g. Ozbayoglu et al., 2005) using Eqs. 1 and 2.  =  √ +  ………………………………………………….…………………………………………………..(1)  =

∆    µ







− 1...……………..………….………………………………..………..…….……………..(2)

where Qf is the total FL (m ), Q0 is the spurt loss (m ), A is the filtration area (m ), ∆P is the differential 2 pressure (Pa), µc is the filtrate viscosity (mPa.s), k is mudcake permeability (m ), t is time spent for filtration (s), fc and fm are the volume fraction of solids in the mud cake and in the mud respectively. Even though some of the parameters in Eq. 2 are known (i.e. A, ∆P, t), the others (k,µ,fc,fm) are generally difficult to quantify using field tests (Ozbayoglu et al., 2005). Eqs. 1 and 2 show that high mud cake permeability increases the FL while high mud cake viscosity decreases it. A fluid with higher viscosity is expected to create a less permeable filtrate cake which is expected to decrease the FL. On the other hand, a fluid with higher mud weight is expected to have a larger fraction of solid content in the filtrate cake which increases the mud filtrate permeability and results in higher FL values. In summary, the FL behavior of a drilling fluid appears to have a dependency on fluid parameters of viscosity and density as given in Eq. 3. 3

3

2

 ′ = ( , µ ) ….………………………………..…………………………….….….…..…………..……….….(3) where  is the fluid density (mud weight, kg/m ), µ is the apparent viscosity (mPa.s) of the fluid,  ’ is 3

total FL (ml/30min) from filter press. For OBM and SBM, it is well-known that they form internal filter cakes during the FL process (Caenn et al., 2016), with FL control provided primarily by their water-in-oil invert emulsions. Crucial properties of these emulsions are their oil-water ratio, which determines e.g. emulsion droplet size and overall mud viscosity, and electrical stability, which reflects the thermodynamic stability of the emulsion and is determined e.g. by the type and concentration of emulsifiers and the amount of shear that the fluid has been subjected to (see e.g. van Oort et al., 2016). Given the important role that the emulsion plays in OBM/SBM FL, it is reasonable to assume that FL behavior in these muds will not only show a dependence on density and viscosity, but also on oil-water ratio and electrical stability. 2.2. Previous Model and Data-Driven Approaches This section summarizes some of the previous model- and data-driven approaches taken to predict FL behavior. Civan (1998a) developed a physics-based model for incompressible cake filtration which

4

estimates mud filtrate rates, filtrate volumes and filtrate cake thickness. The model is a simplified form of previously published models in the literature on drilling fluid FL behavior. The model unfortunately requires 20 parameters for simulation, which are difficult to measure using conventional techniques. Civan (1998b) published a second model for compressible mud filtrate cakes, including the effects of invasion of fine particles for static and dynamic conditions. This model also requires many complex and unconventional input features and high computation time with ordinary equations and finite difference solutions. A FL algorithm (Wu et al., 2001) was developed based on a multi-phase fluid-flow simulator which was previously developed for reservoir simulation studies. The model was verified with sensitivity studies of WBM and porous rock formations having water and oil as saturating phases. The parameters affecting mud FL were (as expected) formation porosity, permeability, capillary pressure, relative permeability, permeability anisotropy and gravity segregation. However, the study did not take into explicit consideration mud properties such as density, viscosity or temperature. Toreifi and Rostami (2014) provided an artificial neural network model to predict the FL during drilling. The provided model uses various drilling and mud data (e.g. well location, depth, rate of penetration, formation types, pump rates, mud pressures, and mud rheology) simultaneously. This model is unfortunately only valid for specific fields, given its specificity to well location and formation information. Ezeakacha et al. (2018) investigated the effects of parameters such as pipe rotation, temperature and presence of lost circulation materials on dynamic FL. They observed that while the increase in temperature increases the mud FL, the concentration of lost circulation materials and increase in fluid viscosity decreases the mud invasion as well as the filtration rates. Even though there are many studies conducted on modelling the filtration behavior of drilling fluids, none of these can be easily applied in field applications. In the following, we present a data-driven approach for API and HPHT FL characterization of WBM and OBM which can be applied with ease using automated mud parameter measurements.

3. Data Details and Machine Learning Regression Results Three field datasets of daily mud checks were used in this study. The first dataset included data from 40 wells drilled with WBM, with a total of 1298 lines of data. This data was obtained from clay and polymerbased muds. The second dataset included data from 50 wells drilled with potassium chloride (KCl) polymer muds, with a total of 1786 lines of data. The third dataset included data from 17 wells drilled with OBM, with a total of 105 lines of data. Various test parameters (including sample temperature, mud weight, viscometer readings and phase content measurements) were available in the datasets. For the data mining analysis in this study, random-forest (RF), XGBoost (XGB), support vector machine (SVM), multilayer perceptron (MLP) and multi-linear regression algorithms from Scikit-learn library (Pedregosa, et al., 2011) were used. The optimum tuning parameters in the regression models were evaluated by cross-validation (GridSearchCV) and by minimizing the mean squared errors. The statistical details of the independent variables from the training datasets of WBM and OBM are provided in Table 1. Table 1. Minimum and maximum limits of the features from OBM and WBM training datasets used in machine/deep learning analysis.

Oil-based mud

Clay-based mud (WBM)

KCl polymer mud (WBM)

Description Minimum

Maximum

Minimum

Maximum

Minimum

Maximum

Sample temperature (°C)

N/A

N/A

18

78

35

86

Mud weight (kg/m3)

796

1282

1006

1689

1054

1641

5

Apparent viscosity (mPa.s)

15

52

14

124

17

92

Plastic viscosity (mPa.s)

9

33

7

73

6

59

Yield point (Pa)

1.9

9.9

3.4

24.4

4.3

26.3

Oil content (%)

56

83.8

N/A

N/A

0

8

Water content (%)

11

33.5

92.5

100

70.5

97.5

Solid content (%)

7.1

15.5

0

7.5

2.5

26

Electrical stability (V)

349

1770

N/A

N/A

N/A

N/A

API filtrate loss (ml/30min)

N/A

N/A

3.2

9

3.2

9

HPHT filtrate loss (ml/30min)

1.4

8.8

N/A

N/A

14

29.8

For the analyses, 80% of the data points from the datasets were used to train the models, while the remaining 20% were used for testing model accuracy and analyzing mean absolute error (MAE) and mean absolute percentage error (MAPE). Eqs. 4 and 5 detail their calculation. 

 = ∑!%0 |($%&'()'*)(+,- − $%&,(.*/,-)| ………………………………………………………………..…...(4) !

 1 =

 % !

∑!%0

|(345676589: &3495;7<9:)| 395;7<9:

…………………..…………………………..……...……………..(5) where yi-calculated represents each calculated data point, yi-measured represents each measured data point, and n is the total number of data points. Plastic viscosity, apparent viscosity and yield point of the fluids were calculated from the Couettetype viscometer readings of 600 and 300 rpm, using Eqs. 6-8. PV = θ600 – θ300…………………………………………………………………………………………….…..…...(6) AV = θ300………………………………………………………………………………………………………..…...(7) YP = [(2xθ300 – θ600) ……………………………………………….…………………………..…..…...(8)

*

0.4788]

where PV is plastic viscosity (mPa.s), AV is apparent viscosity (mPa.s), YP is the yield point (Pa), θ600 is the dial reading from a R1B1 Couette-type viscometer at 600 rpm rotary speed (DR) and θ300 is the dial reading from a R1B1 Couette-type viscometer at 300 rpm rotary speed (DR). In the following, results from API FL estimations of KCl polymer muds are shown with algorithm performance comparisons.

3.1. API FL Estimations of KCl Polymer Muds and Algorithm Performance Comparisons API FL estimations of KCl polymer muds were obtained by various regression algorithms (RF, XGB, SVM, MLP and multi-linear). Plastic viscosity, yield point, initial sample temperature and mud weight were used as features for the predictions of API FL from the field datasets. The same testing and training datasets were used in all the models. Since all five algorithms have different internal dynamics, they provided different results and performances (Osgouei et al., 2015; Raczko and Zagajewski, 2017; Tombul et al.,

6

2019). Further details on the theory of each machine learning algorithm can be found in Appendix B. The statistical performances are compared using boxplots. A boxplot displays the distribution of data based on a five-number summary (minimum, first quartile, median, third quartile and maximum). Figure 1 shows the boxplots of percentage errors obtained from all five algorithms using the KCl polymer mud dataset for API FL estimations.

Figure 1. Boxplots of percentage errors obtained from RF, SVM, MLP, XGB and multi-linear regressions in the prediction of API FL using the KCl polymer mud dataset. The green box shows the interquartile range (IQR) of the distribution of the data. The yellow line represents the median error. The best performance was obtained with the RF algorithm.

Table 2 summarizes the errors in the predictions of API FL of KCl polymer mud in the test data by the five regression algorithms with their correlation coefficient (R), MAE, MAPE and median error results. The results in the table are sorted from the best-to-worst performance. It was observed that the RF approach yields superior performance compared to SVM, MLP, XGB and multi-linear regressions. Table 2. Summary of errors in the predictions of API FL by RF, SVM, MLP, XGB and multi-linear regressions for KCl polymer mud test data. Results are sorted from the best performance to the worst performance. Calculated Parameter HPHT FL

HPHT FL

HPHT FL

HPHT FL

HPHT FL

Dataset KCl polymer (WBM) KCl polymer (WBM) KCl polymer (WBM) KCl polymer (WBM) KCl polymer (WBM)

Features

Regression Algorithm

R

MAE (ml/30min)

MAPE (%)

Median Error (%)

PV, YP, MW, temperature

RF

0.86

0.56

9.15

6.11

PV, YP, MW, temperature

SVM

0.82

0.61

10.19

7.11

PV, YP, MW, temperature

MLP

0.83

0.64

10.77

8.70

PV, YP, MW, temperature

XGB

0.81

0.67

11.42

9.17

PV, YP, MW, temperature

Multi-linear

0.75

0.80

13.65

11.21

7

In the following sections, only the results from the random forest regression approach are presented because this approach yielded superior performance for both API and HPHT FL estimations. Experimental verifications were conducted for the API FL estimation model of clay-based muds only, which is shown in the next section.

3.2. API Filtrate Loss Estimations of Clay-Based Muds Plastic viscosity, yield point, initial sample temperature and mud weight were used as features for the predictions of API FL of clay-based muds in random forest regressions from the field dataset. Figure 2 illustrates the predicted and measured values of API FL using random-forest regression with a ±1 ml/30min confidence interval.

Figure 2. Predicted vs measured values of API FL of clay-based muds using random forest regression with a ±1 ml/30min confidence interval, as indicated by dotted blue lines.

The random forest regression for API FL provided the fit with a MAE of 0.58 ml/30min and a MAPE of 8.81%. The feature importance obtained from this regression is provided in Figure 3. The figure shows the importance in decreasing order of plastic viscosity, mud weight, initial sample temperature and yield point in estimating the API FL of this particular WBM.

8

Figure 3. Feature importance for API FL estimations using plastic viscosity, mud weight, sample temperature and yield point in RF regressions (relative importance of plastic viscosity: 0.48, mud weight: 0.21, temperature: 0.19 and yield point: 0.12).

Figure 4 illustrates a section from one of the trained random forest trees for API filtrate loss estimations. As shown in the figure, decisions were made depending on the values of the independent variables (mud weight, plastic viscosity, yield point and sample temperature). A final estimate of API filtrate loss was obtained using the tree-like structure on the conditions of the trained algorithm.

Figure 4. A section of the random forest decision tree for the API filtrate loss estimation algorithm.

3.3. Experimental Validations of API Filtrate Loss Estimation Model To validate the FL machine learning models, we performed laboratory tests using a 4-unit filter press (∆P = 0.69 MPa) and obtained 28 FL data points of a variety of WBM formulations. The fluids were prepared

9

using different concentrations of barite, bentonite clay, low-gravity clays and xanthan gum polymers mixed in water. 20 of these tests were carried out at ambient temperature and 8 were performed at 50°C . The plastic viscosity, yield point and mud weight measurements of these tests were determined at the same test temperatures as the filtrate tests. A viscometer dial reading value obtained at a shear rate of -1 511 s was used to characterize the apparent viscosity. Mud weight data was obtained using a pressurized mud balance. Figure 5 shows the API FL vs mud weight graph for a bentonite clay-based mud. The mud weight was increased from 1078 kg/m3 to 1282 kg/m3 using barite. The experiments showed an increase of API FL with the increased mud weight, which was also reported by Growcock et al. in a previous study (1994).

Figure 5. Experimental results of API FL vs mud weight graph for a clay-based WBM using a 4-unit filter press. The tests were performed at ambient temperature with 0.69 MPa differential pressure. The only changed feature was the concentration of barite which resulted in a variation in density. The figure shows a direct linear relationship of mud FL to the change in mud weight. 3

Figure 6 shows the API FL vs apparent viscosity graph for an 1138 kg/m polymer-based mud. The apparent viscosity of the fluid was increased from 24.2 mPa.s to 44.4 mPa.s using xanthan gum. As also observed by Ezeakacha et al. (2018), the increase in viscosity decreased the static FL.

10

Figure 6. Experimental results of API FL vs mud apparent viscosity graph for a polymer-based drilling fluid (MW: 1138 kg/m3) using a 4-unit filter press. Variation of the concentration of xanthan gum resulted in a variation in apparent viscosities. The figure shows an inverse relationship of mud FL to the change in mud apparent viscosity.

Figure 7 shows the API FL vs bentonite clay concentration graph for a fluid prepared by 0.3 gr/350ml (lbm/gal) xanthan gum and 65 gr/350ml (lbm/gal) barite. The apparent viscosity and the mud weight of the base fluid was increased using bentonite clay (viscosity increased from 26 mPa.s to 44.5 3 3 mPa.s and mud weight from 1138 kg/m to 1186 kg/m ). As expected, the effects of increased viscosity were more prominent than the increase in mud weight. As a result, a slight decrease in the FL was observed in the experimental results (from 7.0 ml/30min down to 6.4 ml/30min).

Figure 7. Experimental results of API FL vs mud apparent viscosity graph for a clay and polymer-based drilling fluid using a 4-unit filter press. Variation of the concentration of bentonite clay resulted in an increase in apparent viscosities as well as the mud weights.

The performance of all experimental measurements was verified using the random forest regression model. Figure 8 shows the predicted vs measured FL of experimental measurements using the random forest regression with ±1 ml/30min confidence interval as an overlay on the field dataset.

11

Figure 8. Predicted vs measured values of experimental results of clay and polymer-based fluids for API FL data analyzed using random forest regression with ±1 ml/30min confidence interval as indicated by dotted blue lines. Fluids used in the experiments were prepared using various amounts of barite, bentonite, low-gravity clays and xanthan gum mixed with water. Blue dots: data from WBM field dataset; Red triangles: data from laboratory experiments.

The random forest regression applied to the experimental data provided a fit with a MAE of 0.73 ml/30min and a MAPE of 9.81%. The maximum absolute error observed was 2.05 ml/30min with a maximum percentage error of 36.67%. It should be noted that the verification experiments were conducted using only a few different types of chemicals (barite, bentonite, low-gravity clays, xanthan gum). 3.4. HPHT Filtrate Loss Estimations A similar approach was also explored for HPHT filtrate loss estimations of WBM and OBM. All HPHT FL measurements were performed at 121°C. The rheology measurements in the HPHT FL datasets were conducted at 65.5°C. The sample temperature feature was excluded from this regression approach due to the consistency in measurement temperatures. Hence, three different features (plastic viscosity, yield point and mud weight) were used for the random forest regression estimations of HPHT FL measurements of KCl polymer muds. Figure 9 illustrates the predicted and measured values of HPHT FL using random-forest regression. A ±2 ml/30min confidence interval was used in Figure 9 since the measured FL values in HPHT conditions were significantly higher than the API measurements which therefore generated a higher error margin in the estimations.

12

Figure 9. Predicted vs measured values of HPHT FL of KCl polymer mud using random forest regression with a ±2 ml/30min confidence interval, as indicated by dotted blue lines.

The random forest regression for HPHT FL of KCl polymer muds provided the fit with a MAE of 1.15 ml/30min and a MAPE of 5.17%. Similarly, the same approach was also applied for the HPHT FL estimations of OBM. Features characterizing the composition and stability of the emulsion (i.e. electrical stability and water content) were explicitly included in the analysis. Hence, four different features (electrical stability, apparent viscosity, mud weight and water content) were used for the random forest regression estimations for OBM. Figure 10 illustrates the predicted and measured values of HPHT FL of OBM using random-forest regression with a ±1 ml/30min confidence interval.

13

Figure 10. Predicted vs measured values of HPHT FL of OBM using random forest regression with a ±1 ml/30min confidence interval, as indicated by dotted blue lines.

The random forest regression for HPHT FL of OBM provided the fit with a MAE of 0.79 ml/30min and a MAPE of 22.57%. The feature importance obtained from this regression is provided in Figure 11. The figure shows in decreasing order the importance of water content, electrical stability, mud weight and apparent viscosity in estimating the HPHT FL of OBM.

14

Figure 11. Feature importance for HPHT FL estimations of OBM using water content, electrical stability, mud weight, and apparent viscosity (relative importance of water content: 0.49, electrical stability: 0.27, mud weight: 0.16, and apparent viscosity: 0.08).

Figure 12 illustrates a section from one of the trained random forest trees for HPHT filtrate loss estimations. Similar to Figure 4, decisions were made depending on the values of the independent variables. A final estimate of HPHT filtrate loss was obtained using the tree-like structure on the conditions of the trained algorithm.

Figure 12. A section of the random forest decision tree for the HPHT filtrate loss estimation algorithm for OBM.

Table 3 summarizes the errors in estimations of API and HPHT FL in the test and experimental data by random forest machine learning regressions with their correlation coefficient (R), MAE, MAPE and median error results. The regressions for WBM performed better since there were significantly more data points in the WBM training dataset compared to the OBM training dataset (3084 vs 105 lines of data). With the availability of more field data, it is to be expected that the performance of the HPHT FL regression model of OBM can also be improved.

15

Table 3. Summary of errors in estimations of API and HPHT FL by random forest machine learning regression for WBM and OBM test data and WBM experimental data. Results sorted from the best performance to the worst performance. Calculated Parameter HPHT FL API FL API FL API FL

Dataset KCl polymer (WBM) Clay-based (WBM) KCl polymer (WBM) Experimental data (WBM)

HPHT FL

OBM

Regression Method

Features

R

MAE

MAPE (%)

Median Error (%)

RF

PV, YP, MW

0.825

1.152

5.173

3.154

0.810

0.582

8.807

5.373

0.854

0.556

9.145

6.108

0.700

0.730

9.812

9.049

0.840

0.785

22.566

17.778

RF RF RF RF

PV, YP, MW, temperature PV, YP, MW, temperature PV, YP, MW, temperature ES, AV, MW, water content

Table 4 summarizes the regression tuning parameters for each trained RF algorithm. For the parameters which are not shown in the table, the default values from the Scikit-learn library (Pedregosa, et al., 2011) were used. Table 4. RF regression tuning parameters for each trained model. Calculated Parameter HPHT FL

API FL

API FL

HPHT FL

Dataset

Maximum depth of the tree

KCl polymer (WBM) Clay based (WBM) KCl polymer (WBM)

50

OBM

50

70

50

Criteria Mean squared error Mean squared error Mean squared error Mean squared error

Minimum samples to be at a leaf node

Minimum samples to split an internal node

Number of Trees

1

5

100

1

4

200

1

5

100

1

5

10

4. Discussion and Conclusions For the mud systems investigated here, API and HPHT FL can apparently be determined with good accuracy using machine learning techniques. API FL can be estimated using measurements of (in declining order of importance) apparent viscosity, density, temperature and yield point. Similarly, HPHT FL can be estimated using measurements of (in declining order of importance) electrical stability, water content, density and apparent viscosity. The machine learning relationships will be different for different mud types, but once a relationship is characterized using a sufficient amount of lab and field data, it appears that it can be used with confidence, i.e. with an accuracy similar to the measurements and interpretation provided by a human mud engineer, as indicated by the consistency of our results comparing field data with lab data for similar mud formulations. The approach offers an elegant way to automate FL characterization, which is used for mud maintenance guidance in the field and is often required for field reporting by governing regulators. While current API test protocols are hard - if not impossible - to practically automate, it appears to be relatively easy to automate the measurement of the parameters employed in the machine learning approach: •

Density from e.g. Coriolis mass flow meters (Gul et al., 2019)



Rheology parameters from e.g. automated pipe viscometer measurements (Karimi Vajargah et al., 2016)



Water content from e.g. a water cut analyzer (Gul et al., 2019)

16 •

Electrical stability from e.g. a convenient in-line electrical stability meter

Objections that can be leveraged against the machine-learning approach and its proposed automation is that it does not involve actual and direct FL measurements, and that the approach may not work in situations where there is severe mud contamination that compromises FL control. The response to these objections is as follows. As explained previously, current FL tests are actually not representative of the downhole environment, i.e. the information they provide is highly qualitative and provides very limited information on actual downhole FL behavior. If an actual measurement of FL is necessary, then it can be done using sensitive delta flow (i.e. the difference between the inflow into – and outflow from – a well) measurements that quantify downhole losses, i.e. representative measurements that assess filtrate lost across actual downhole formations (see e.g. work by Sanfillippo et al., 1997; Beda and Carugo, 2001; Huang et al., 2011; Al-Adwani et al., 2012; Al-Muraikhi et al., 2013), not idealized filtrate media used in the API test methods. We support such more respresentative methods to replace or update the current API FL methods in the future. However, in the mean time with the API test protocols still in effect, our proposed ML methods offer a convenient, less labor-intensive way to characterize FL in case historical FL databases are available (which is the case for practically all field applications, with data provided by the mud suppliers). Furthermore, severe mud contamination events are relatively rare and often preventable occurrences that will become even rarer with progressive automation, oversight and control. In the rare instances when a mud does get severely contaminated, there are usually significant changes in fluid viscosity and/or density as well (e.g. an influx of brine from a water kick in an OBM/SBM system that may “flip” an invert emulsion – the loss for FL control is accompanied by a sudden strong increase in mud viscosity). The consequences of such contamination are therefore expected to be quite apparent in automated mud density and rheology measurements and in the derived parameters, showing clear outlier behavior for the latter. Outlier detection can then be used as a prompt to take corrective action to get mud properties back in check.

Nomenclature =! Θ?

µc ρf µm ∆P A AV @ ES j fc fm k M m(x) MAE MAPE n PV

= training dataset th = features for the j tree in the forest = filtrate cake viscosity [mPa.s] 3 = mud weight [kg/m ] = mud apparent viscosity [mPa.s] = differential pressure [Pa] = filtration area [m2] = apparent viscosity [mPa.s] = the estimation = electrical stability [V] = the tree number in random forests = volume fraction of solids in the mud-cake = volume fraction of solids in the mud 2 = mudcake permeability [m ] = the number of randomized regression trees = the regression function for a selected data vector A = mean-absolute error [ml/30min] = mean-absolute-percentage error [%] = the total number of data points = plastic viscosity [mPa.s]

17 3

Q0 = spurt loss [m ] 3 = total fluid loss loss [m ] Qf Qf ' = total filtrate loss from filter press [ml/30min] t = time spent for filtration [s] WC = water content [%] B = the response yi-calculated=each calculated data point yi-measured=each measured data point YP = yield point [Pa] θ600 = the dial reading from a Couette-type viscometer at 300 rpm rotation speed [DR] θ600 = the dial reading from a Couette-type viscometer at 600 rpm rotation speed [DR]

Glossary APE API AV ECD FL HPHT IQR KCl MAE MAPE MLP MW OBM PV R RF SBM SVM WBM XGB YP

= Absolute-percentage error = American Petroleum Institute = Apparent viscosity = Equivalent circulating density = Filtrate loss = High-pressure, high-temperature = Inter-quartile range = Potassium chloride = Mean-absolute error = Mean-absolute-percentage error = Multilayer perceptron = Mud weight = Oil-based mud = Plastic viscosity = Correlation coefficient = Random forest = Synthetic-based mud = Support vector machine = Water-based mud = XGBoost = Yield point

Acknowledgments The authors like to thank the Rig Automation and Performance Improvement in Drilling (RAPID) group at The University of Texas at Austin and its sponsors for their support and encouragement. Special thanks to Evan Hall and Colin Schroeder for their assistance in performing experiments. Stable Drilling Fluids, Hess Corporation and Anadarko Petroleum Corporation are acknowledged for providing the datasets for this study.

References Al-Adwani, T., Singh, S., Khan, B., Dashti, J., Ferroni, G., 2012. Real Time Advanced Surface Flow Analysis for Detection of Open Fractures, in: SPE Europec/EAGE Annual Conference. Society of Petroleum Engineers, Copenhagen. https://doi.org/10.2118/154927-MS. Al-Muraikhi, R., Al-Shamali, A., Alsammak, I.A., Estarabadi, J., Martocchia, A., Ferroni, G., Marai, N., Janbakhsh, M., 2013. Real Time Advanced Flow Analysis for Early Kick/Loss Detection &

18

Identification of Open Fractures., in: SPE Kuwait Oil and Gas Show and Conference. Society of Petroleum Engineers, Kuwait City. https://doi.org/10.2118/167335-MS. American Petroleum Institute Specifications API 13B-1, 2009. Recommended Practice for Field Testing of Water-Based Drilling Fluids. American Petroleum Institute Specifications API 13B-2, 2005. Recommended Practice for Field Testing of Oil-Based Drilling Fluids. Beda, G., Carugo, C., Division, E.N.I.A., 2001. Use of Mud Microloss Analysis While Drilling to Improve the Formation Evaluation in Fractured Reservoir, in: SPE Annual Technical Conference and Exhibition. Society of Petroleum Engineers, New Orleans. https://doi.org/10.2118/71737-MS. Biau, G., Scornet, E., 2016. A random forest guided tour. TEST. 35, 197-227. https://doi.org/10.1007/s11749-016-0481-7. Breiman, L., 2001. Random Forests. Machine Learning. 45, 5-32. https://doi.org/10.1023/A:1010933404324. Caenn, R., Darley, H.C.H., Gray, G.R., 2016. Composition and Properties of Drilling and Completion th Fluids, 7 ed. Gulf Professional Publishing. nd Chen, T., Guestrin, C., 2016. XGBoost: A Scalable Tree Boosting System, in: Proceedings of the 22 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, San Francisco. https://doi.org/10.1145/2939672.2939785. Civan, F., 1998a. Incompressive cake filtration: Mechanism, parameters, and modeling. AIChE J. 44, 2379–2387. https://doi.org/10.1002/aic.690441106. Civan, F., 1998b. Practical model for compressive cake filtration including fine particle invasion. AIChE J. 44, 2388–2398. https://doi.org/10.1002/aic.690441107. Ezeakacha, C.P., Salehi, S., Bi, H., 2018. A New Approach to Characterize Dynamic Drilling Fluids Invasion Profiles in Application to Near-Wellbore Strengthening Effect, in: IADC/SPE Drilling Conference and Exhibition. Society of Petroleum Engineers, Fort Worth. https://doi.org/10.2118/189596-MS. Frigge, M., Hoaglin, D.C., Iglewicz, B., 1989. Some Implementations of the Boxplot. The American Statistician. 43, 50-54. https://doi.org/10.2307/2685173. Growcock, F.B., Ellis, C.F., Schmidt, D.D., 1994. Electrical Stability, Emulsion Stability, and Wettability of Invert Oil-Based Muds. SPE Drilling & Completions. 9, 39-46. https://doi.org/10.2118/20435-PA. Gul, S., van Oort, E., Mullin, C., Ladendorf, D., 2019. Automated Surface Measurements of Drilling Fluid Properties: Field Application in the Permian Basin, in: SPE/AAPG/SEG Unconventional Resources Technology Conference. Society of Petroleum Engineers, Denver. https://doi.org/0.15530/urtec2019-964. Huang, J., Griffi, D. V, Wong, S., 2011. Characterizing Natural-Fracture Permeability From Mud-Loss Data. SPE J. 16, 111–114. https://doi.org/10.2118/139592-PA. Jiao, D., Sharma, M.., 1992. Formation damage due to static and dynamic filtration of water-based muds, in: SPE Formation Damage Control Symposium. Society of Petroleum Engineers, Lafayette, pp. 491–501. https://doi.org/10.2118/23823-MS. Karimi Vajargah, A., Sullivan, G., van Oort, E., 2016. Automated Fluid Rheology and ECD Management, in: SPE Deepwater Drilling and Completions Conference. Society of Petroleum Engineers, Galveston. https://doi.org/10.2118/180331-MS. Liu, X., Civan, F., 1996. Formation Damage and Filter Cake Buildup in Laboratory Core Tests : Modeling and Model-Assisted Analysis. SPE Form. Eval. 26–30. https://doi.org/10.2118/25215-PA. Macpherson, J.D., Hughes, B., Wardt, J.P. De, Wardt, D.E., Florence, F., Oilwell, N., Chapman, C.D., Zamora, M., Swaco, M., Laing, M.L., Iversen, F.P., 2013. Drilling Systems Automation : Current State, Initiatives and Potential Impact. SPE Drill. Complet. 28, 296–308. https://doi.org/10.2118/166263-PA. Osgouei, R.E., Ozbayoglu, A.M., Ozbayoglu, E.M., Yuksel, E., Eresen, A., 2015. Pressure drop

19

estimation in horizontal annuli for liquid-gas 2 phase flow: Comparison of mechanistic models and computational intelligence techniques. Computers and Fluids. 112, 108-115. https://10.1016/j.compfluid. 2014.11.003. Ozbayoglu, E. M., Gunes, C., Apak, E.C., Kok, M.V., Iscan, A.G., 2005. Empirical correlations for estimating filtrate volume of water based drilling fluids. Pet. Sci. Technol. 23, 423–436. https://doi.org/10.1081/LFT-200031037. Pedregosa, F., Weiss, R., Brucher, M., 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12, 2825–2830. https://doi.org/10.1007/s13398-014-0173-7.2. Raczko, E., Zagajewski, B., 2017. Comparison of support vector machine, random forest and neural network claffiers for tree species classification on airborne hyperspectral APEX images. European Journal of Remote Sensing. 50, 144-154. https://doi.org/10.1080/22797254.2017.1299557. Rogers, W.F., 1948. Composition and Properties of Oil Well Drilling Fluids, 1st ed. Gulf Publishing Company. Saasen, A., Omland, T.H., Asa, S., Ekrene, S., Breviere, J., Villard, E., Tehrani, A., Cameron, J., Freeman, M., Growcock, F., Patrick, A., Swaco, M., Jørgensen, T., Reinholt, F., Scholz, N., As, A., Amundsen, H.E.F., Steele, A., As, E.P.X., 2008. Automatic Measurement of Drilling Fluid and Drill Cuttings Properties, in: IADC/SPE Drilling Conference. Society of Petroleum Engineers, Orlando. https://doi.org/10.2118/112687-PA. Sanfilippo, F., Santarelli, F.J., Bezzola, C., 1997. Characterization of conductive fractures while drilling, in: SPE European Formation Damage Conference. Society of Petroleum Engineers, The Hague. https://doi.org/10.2118/38177-MS. Smola, A.J. and Scholkopf, B., 2003. A tutorial on support vector regression. Statistics and Computing. 14, 199-222. https://doi.org/10.1023/B:STCO.0000035301.49549.88. Tombul, H., Ozbayoglu, A.M, Ozbayoglu, E.M., 2019. Computational Intelligence Models for PIV based Particle (Cuttings) Direction and Velocity Estimation in Multi-Phase Flows, Journal of Petroleum Science and Engineering. 172, 547-558. https://doi.org/10.1016/j.petrol.2018.09.071. Toreifi, H., Rostami, H., Manshad, A.K., 2014. New method for prediction and solving the problem of drilling fluid loss using modular neural network and particle swarm optimization algorithm. J. Pet. Explor. Prod. Technol. 4, 371–379. https://doi.org/10.1007/s13202-014-0102-5. van Oort, E., Hoxha, B. B., Yang, L., & Hale, A., 2016. Automated Drilling Fluid Analysis using Advanced Particle Size Analyzers, in: IADC/SPE Drilling Conference and Exhibition. Society of Petroleum Engineers, Forth Worth. https://doi.org/10.2118/178877-MS. Wu, J., Torres-Verdin, C., Sepehrnoori, K., Delshad, M., 2001. Numerical Simulation of Mud Filtrate Invasion in Deviated Wells, in: SPE Annual Technical Conference and Exhibition. Society of Petroleum Engineers, Louisiana. https://doi.org/10.2118/71739-MS.

20

Appendix A – Summary of HPHT FL Measurement Procedure Drilling fluid property characterization involves several manually executed analytical tests, conducted in accordance with American Petroleum Institute (API) recommended practices 13B-1 and 13B-2. Standard (API) and highpressure, high-temperature (HPHT) filter press units are used for filtrate loss (FL) measurements. A summary of HPHT FL test procedure according to API standards is as follows (API 13B-1,2005; API 13B-2,2009): 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.

Place a thermometer in the heating jacket. Adjust the thermostat and preheat the jacket to 6°C above the desired test temperature (most tests are conducted at 121°C). Mix the mud sample for at least 5 minutes using the field mixer. Pour the mud sample into the filter cell by leaving at least 2.5 cm space to allow for expansion. Install the filter paper in the cell. Complete the assembly of the filter cell (tighten all the nuts carefully) and close all the valves. Place the cell in the heating jacket (if the heating jacket is at the desired temperature). Connect the thermometer into the well of the filter cell. Connect the high-pressure filtrate collection tube onto the lower part of the bottom valve stem (make sure the collection tube is dry). Connect a pressure source with a regulator to the upper valve. Connect a pressure source with a regulator to the bottom valve. Set the lower regulator to the minimum back-pressure value for the test temperature. Set the upper regulator to 690 kPa higher than the minimum back-pressure value. Maintain constant pressure until test temperature is reached. When the desired temperature is reached, open the bottom valve stem and increase the upper regulator pressure to 3450 kPa higher than the backpressure. Start the timer. Maintain the temperature constant during the test. Collect the filtrate in the graduated cylinder. Read the volume of the 30-min total filtrate. After 30 minutes, close the lower and upper valve stems. Bleed off the pressure according to the manufacturer’s instructions. Disconnect the pressurization system. Remove the cell from the heating jacket and allow the cell to cool below 50°C. Bleed the pressure and disassemble the cell. Pour the liquid from the cell. Remove the filter cake on the filter paper and measure the filter cake thickness from its center to the nearest millimeter. Report the filtrate volume and filter cake thickness in mud reports.

Appendix B – Machine Learning Regressions B.1. Multi-linear regression Multi-linear regressions are one of the most widely used prediction techniques in statistical analysis. The multi-linear regression method uses a linear approach to estimate the regression coefficients related to each feature to predict the desired response (Eq. B1).

21 $ = C + C D + C D + CE DE + ⋯ + C D …………………………………………………………………..(B1) where m is the size of vector x (independent variables) and y is the dependent variable. The regression coefficients C to C are chosen by minimizing the sum of squares (E) (Eq. B2) (Tombul et al., 2019). 

= ∑!%0($% − (∑ ?0 C? D%? + C )) ……………………………………………………………………………...(B2) B.2. Random forest regression (RF) The random forest regression approach, which was first proposed by L. Breiman (2001), works by a combination of several randomly generated tree predictors and provides a final estimated value as the average of each tree’s result. The algorithm uses a nonparametric estimation to build a random forest. The training is done using a dataset (=! = ((G , B ), … , (G! , B! )) which is composed of the independent variable pair (G, B). For the predictions, the inputs are the observed values as a vector (I) with the goal of estimating the response (B) by estimating the regression function, which is provided in Eq. B3 (Blau and Scornet, 2016). J(A) = @[B|G = A] ….………………………………..…………………………….….….………….…..…….(B3) where m(x) is the regression function for a selected data vector A (with components such as plastic viscosity, yield point, mud weight, and temperature), @ stands for the estimation and B stands for the response (i.e. the desired FL output value). th A random forest is a combination of M randomized tree predictors. For the j tree in the forest, the predicted value of the vector A is J! (A; Θ? , =! ) where each Θ? stands for the features of the algorithm th th for the j tree. The combination of all the j tree estimates gives the final prediction for random forest regressions as the averaged value, which is shown in Eq. B4 (Blau and Scornet, 2016). 

JN,! (A; Θ , … , ΘN , =! ) = N ∑N ?0 J! OA; Θ? , =! P………..…….…………….…………….……….…………….(B4) where m is the regression function, M is the number of randomized regression trees, Θ? are the features th for the j tree in the forest, and =! is the training dataset. Further details on constructing the random forest individual trees, performing the randomness, parameter tuning, and variable importance can be found in works published by Breiman (2001) and Blau and Scornet (2016). B.3. XGBoost regression (XGB) XGBoost is a scalable end-to-end gradient tree boosting system. The model in the XGBoost regression algorithm is trained in an additive manner. The foundation of the algorithm is the same as the gradient tree boosting. However, in a gradient tree boosting method, the most time-consuming process is to get the data sorted. To reduce the cost of this process, XGBoost stores the data in in-memory units, which is also called as blocks. In order to prevent overfitting, the algorithm incorporates a regularized model while simplifying the objective and algorithm for parallelization. The method also uses exact greedy algorithms, approximate global and approximate local algorithms with blocks for out-of-core computations and sparsity-aware split finding (Chen,T. and Guestrin,C., 2016). B.4. Support vector machine (SVM)

22

Support vector machine (SVM) was originally developed as a classifier algorithm. The method uses linear parallel support vectors to maximize the distance between two separate classes. Assuming that the classes are linearly separable, the support vectors gather the closest data points into one particular class. The same principle applies to support vector regressions. There are various types of support vector regressions such as linear, non-linear and Kernel functions. Further details on SVM can be found in the work published by Smola and Scholkopf (2003). B.5. Multilayer perceptron (MLP) Multilayer perceptron (MLP) is an artificial neural network algorithm used for regression problems. The learning algorithm used in this approach is called “Backpropagation”. In this method, the input signal is fed forward to the network. Then, the network performs the estimations, calculates the errors and feeds the results back to the input layer. This optimizes the network weights until the errors are minimized (Osgouei et al., 2015).

A Machine Learning Approach to Filtrate Loss Determination and Test Automation for Drilling and Completion Fluids Highlights • • • • •

Current API test protocols for fluid filtrate loss tests are now more than 70 years old. Current API test protocols for fluid filtrate loss tests are very difficult to automate. Machine learning allows for determination and test automation of API and HPHT filtrate loss. Filtrate loss is related to viscosity and density for drilling and completion fluids. Electrical stability and water content also determine HPHT filtrate loss behavior.

Author Contribution Statement Sercan Gul: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing – Original draft preparation, Writing – Reviewing and Editing, Visualization Eric van Oort: Conceptualization, Resources, Data curation, Writing – Original draft preparation, Writing – Reviewing and Editing, Visualization, Supervision, Project administration

Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: