FTIR, 1H and 13C NMR data fusion to predict crude oils properties

FTIR, 1H and 13C NMR data fusion to predict crude oils properties

Fuel xxx (xxxx) xxxx Contents lists available at ScienceDirect Fuel journal homepage: www.elsevier.com/locate/fuel Full Length Article FTIR, 1H an...

2MB Sizes 0 Downloads 43 Views

Fuel xxx (xxxx) xxxx

Contents lists available at ScienceDirect

Fuel journal homepage: www.elsevier.com/locate/fuel

Full Length Article

FTIR, 1H and

13

C NMR data fusion to predict crude oils properties

Mariana K. Moroa, Álvaro C. Netoa, Valdemar Lacerda Jr.a, Wanderson Romãoa,b, ⁎ Luiz S. Chinelatto Jr.c, Eustáquio V.R. Castroa, Paulo R. Filgueirasa, a

Center of Competence in Petroleum Chemistry (NCQP), Laboratory of Research and Development of Methodologies for Analysis of Oils –LabPetro, Chemistry Department, Universidade Federal do Espírito Santo, Av. Fernando Ferrari, 514, CEP: 29075-910 Vitória, Espírito Santo, Brazil Instituto Federal de Educação, Ciência e Tecnologia do Espírito Santo, Vila Velha, ES, Brazil c Chemistry Division, Research and Development Center (Cenpes), PETROBRAS, Av. Horácio Macedo, 950, 21941-915, Rio de Janeiro, Brazil b

ARTICLE INFO

ABSTRACT

Keywords: Data fusion Crude oil NMR FTIR PLS

Data fusion from different analytical sources can be a feasible way to estimate physicochemical properties of petroleum when compared to using a single analytical technique. This occurs because outputs of different instrumental techniques can carry complementary information and act synergistically during calibration. In this paper, we investigate the potential of data fusion strategies for estimating seven crude oil properties: sulphur content (S), total nitrogen content (TN), basic nitrogen content (BN), total acid number (TAN), saturated (SAT), aromatic (ARO) and polar (POL) contents. We used 127 crude oil samples split into 70% for calibration and 30% for prediction. Partial least squares (PLS) regression models were constructed from Fourier transform mid-infrared (FTIR) and 1H and 13C nuclear magnetic resonance (NMR) spectroscopy. Data fusion models were built: fused at low and mid-level in different combinations. While mid-level fusion usually increased the accuracy of models, low-level fusion caused insignificant improvements. Using PLS mid-level fusion, we estimated S, TN, BN, TAN, SAT, ARO and POL contents with average prediction errors of 0.064 wt%, 0.049 wt%, 0.0070 wt%, 0.16 mgKOH·g−1, 5.34 wt%, 3.66 wt% and 6.58 wt%, respectively, with coefficients of determination equal to 0.87, 0.78, 0.98, 0.91, 0.79, 0.67 and 0.63 for the prediction set and using 4, 3, 3, 3, 2, 4 and 2 latent variables, respectively. Although promising results were obtained, mid-level fusion demonstrates to be the best strategy usually improving accuracy of models.

1. Introduction Chemometrics enables the extraction of important information from spectroscopic data. Physicochemical properties of petroleum can be estimated with mathematical models that correlate them with analytical data [1]. However, a data set from only one instrumental source cannot bring enough information, producing models with limited predictive power [2]. In this case, data fusion strategy can be used to combine data from two or more analytical sources providing complementary and synergetic information, and consequently generating better estimates than a single technique [3–6]. In 1999, Steimentz et al. proposed a methodology to classify data fusion into three different levels: low, mid and high. In low-level fusion, all data from different sources are concatenated in a single matrix prior to multivariate model. Fused data must be similar in scale and variance; otherwise one data source may predominate over the other. Mid-level data fusion is performed when relevant information from each data source is first extracted by techniques such as Principal Component



Analysis (PCA) or PLS. The extracted information is merged for later multivariate calibration [4]. It provides a significant reduction of variables and eliminates information noise and redundancy [5]. In high-level strategy, a prediction model is built from each individual source and the results are combined to obtain a final decision model [4]. Especially for this strategy, data must be carefully processed to avoid information loss during the process [5]. Some studies explored the impact of data fusion levels on building predictive or discriminative models [6–8]. Li et al. (2018) fused midand near-infrared data at low-, mid- and high-level to build models for the geographical traceability of Panax notoginseng. Individual (nonfusion) spectral data were unable to satisfactorily classify the samples. Among the three strategies, high-level data fusion provided the best classification results and low-level, the lowest performance. According to the authors, at low-level fusion, raw data may contain redundant information, noise or interference that obstructs the synergistic effect between data [7]. In recent years, an extensive scientific production using data fusion

Corresponding author. E-mail address: [email protected] (P.R. Filgueiras).

https://doi.org/10.1016/j.fuel.2019.116721 Received 16 September 2019; Received in revised form 17 November 2019; Accepted 20 November 2019 0016-2361/ © 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: Mariana K. Moro, et al., Fuel, https://doi.org/10.1016/j.fuel.2019.116721

Fuel xxx (xxxx) xxxx

M.K. Moro, et al.

has been published in the analytical chemistry field. However, most of them focus on food analysis, a type of matrix that is usually complex. Borràs et al. reported 77 studies that apply data fusion technique, using several different types of food. Among these, 43, 25 and 9 studies employed, respectively, low-, mid- and high-level data fusion being, therefore, the high-level strategy less explored than others [5]. Most of these studies have proposed to identify food origins, such as olive oil [9], palm oil [10], orange juice [11], beer [12], red wine [13], among others. Besides classification models, several studies have estimated some properties in the food area [14–16] and results have shown that data fusion usually provide better predictive models. Data fusion has been largely studied in the food field, although less explored in other areas, like medicine and biology [17]. Given this context, this study aimed to construct analytical methodologies for determination of some physicochemical properties of crude oil, using chemometrics analysis associated with data fusion techniques. Data supplies by three different spectroscopic techniques (FTIR, 1H NMR and 13 C NMR) were modeled individually. To utilize the synergy and provide complementary information, the data were also modeled after fused at low- and mid-level, at several different combinations.

probe. The instrumental conditions for hydrogen spectra were frequency of 399.73 MHz, spectral width of 6410.3 Hz, acquisition time of 2.556 s, relaxation delay of 1 s, pulse angle flip of 90° and number of transients equal to 64. The instrumental conditions for carbon spectra were frequency of 100.51 MHz, spectral width of 25510.2 Hz, acquisition time of 1.285 s, relaxation delay of 7 s, pulse angle flip of 90° (14.2 μs), number of transients equal to 1000 and decoupling mode nny. The solvent used in dissolution of samples was deuterated chloroform, CDCl3, with Cr(acac)3 0.05 mol·L–1. FTIR and NMR spectroscopic techniques provide molecular information. This information may be directly related to the modeled physicochemical property or related to other components that are proportional to the modeled property. The FTIR provides direct information of the CeS and SeH bonds that are related to total sulfur content or CeH and NeH related to total nitrogen content. For these same properties’ NMR obtains signals that are correlated to these properties (indirect prediction), since we are not directly measuring these atoms. However, for the polar content, 13C NMR has great potential as it obtains carbon information in polyaromatic structures that are directly related to the modeled property. Thus, using chemometrics it is possible to estimate some properties of petroleum.

2. Experiment

2.3. Chemometrics

2.1. Physicochemical analysis

The procedures described in this section were followed for constructing calibration models of all the seven properties evaluated in this study: sulphur content, total and basic nitrogen contents, total acid number and saturated, aromatic and polar contents. For each property, PLS prediction models were built from the individual FTIR, 13C NMR and 1H NMR calibration sets and from four different spectra combinations: FTIR + 1H NMR, FTIR + 13C NMR, 1H NMR + 13C NMR, and FTIR + 1H NMR + 13C NMR at low- and mid-level fusion. The samples were further split into two sets: 70% samples for calibration (model development) and 30% samples for prediction, by using the Rank-KS algorithm [24]. The calibration set samples were used to construct models that were later applied to prediction set samples. Multivariate models were compared for their assessed accuracy in the test data set.

In this paper we used 127 crude oil samples from the Brazilian sedimentary coastal basin. These oils presented American Petroleum Institute (API) gravity ranging from 11.4 to 54.0 °API, comprising examples of light to heavy oils, according to the API classification [18]. Oil underwent physicochemical analyses measured according to standard methods, as described below. Total sulphur (S) content was determined according to the ASTM D4294-16 [19]. Energy dispersive X-ray fluorescence spectrometry technique was applied and the characteristic excitation of each sample was compared to standard sulphur samples. Total nitrogen (TN) content was determined by the standard ASTM D4629-17 [20] method. The assay consists of combustion of samples, followed by nitrogen determination by chemiluminescence, using a standard calibration curve. Basic nitrogen (BN) content was determined according to the UOP 269 [21] method, by potentiometric titration, using a pH electrode and perchloric and acetic acids as titration and solvents. To determine the total acid number (TAN) of oil samples, a potentiometric titration using the selective electrode Solvotrode easyClean was carried out according to the ASTM D664 [22] standard. The results were expressed in milligrams of potassium hydroxide required to neutralize one gram of sample. Saturate (SAT), aromatic (ARO), and polar (POL) contents were determined by modified ASTM D2549-02 [23]. Resin and asphaltene contents were determined as a single property, named polar. A 25 cm chromatographic column with successive elutions of 200 mL of hexane, 200 mL hexane:dichloromethane (1:1) and 200 mL of methanol separated, respectively, the saturated, aromatic and polar fractions. Silica gel, activated at 120 °C for 12 h, was used as stationary phase. The silica particle size ranged from 230 to 400 mesh.

2.3.1. Non-fusion modeling Before modeling, the NMR spectra were aligned using the icoshift (Interval Correlation Optimized Shifting) algorithm [25]. After that, based on previous tests, the FTIR and NMR spectrum were, respectively, pretreated with Multiple Scatter Correction (MSC) and Standard Normal Variate (SNV) and then, auto scaled. Subsequently, PLS was carried out for each FTIR, 1H NMR and 13C NMR spectral data sets, using ChemoAC Toolboxes, a free package provided by the Chemometrics Group of the Free University of Brussels for MatLab (The MathWorks, Inc.) environment. The optimal number of latent variables was determined by cross validation, using the venetian blind 5-fold. To reduce overfitting chances, the number of latent variables for non-fused models was limited to 10. The models were evaluated by their coefficient of determination – for calibration (RC2) and prediction (RP2) sets – and by their accuracy, measured using Root Mean Squares Error parameter – for calibration (RMSEC) and prediction (RMSEP) sets – which calculates an average of difference between real and predicted values. To better understand the accuracy, % RMSEP was also calculated by Eq. (1) [26].

2.2. Spectral analysis Fourier Transform Infrared (FTIR) spectra of all samples were obtained in a PerkinElmer spectrometer, equipped with attenuated total reflection accessory, with zinc selenide crystal. Spectra were measured in the region between 4000 and 650 cm−1 with a resolution of 4 cm−1 and 32 readings per sample. The nuclear magnetic resonance (NMR) spectra of hydrogen and carbon were obtained in a Varian/Agilent spectrometer, VNMRS400 model, operating at 9.4 T, using a 5 mm BroadBand 1H/X/D NMR

%RMSEP =

RMSEP yprediction

× 100

(1)

Furthermore, residues were evaluated in post-modeling analysis to verify evidence of systematic errors, by statistical bias tests with a 95% confidence interval, and also to verify trends by a permutation test described in Filgueiras et al. 2014 [27]. 2

Fuel xxx (xxxx) xxxx

M.K. Moro, et al.

2.3.2. Data fusion strategies For low- and mid-level data fusion, four spectra combinations were tested: FTIR + 1H NMR, FTIR + 13C NMR, 1H NMR + 13C NMR, and FTIR + 1H NMR + 13C NMR. In low-level fusion, preprocessed data from individual spectra were concatenated into a single matrix. The new matrix was preprocessed with SNV and auto scaled. Next, PLS and cross validation were executed sequentially. To reduce overfitting chances, the number of latent variables was limited to 10. In mid-level data fusion, PCA was carried out in the three preprocessed individual spectra separately. After that, the 10 first scores of each yielded PCA were concatenated into a single matrix and PLS and cross validation were executed sequentially. To reduce overfitting chances, the number of latent variables was limited to 10. This approach was called PCA mid-level data fusion. In another strategy of mid-level data fusion, PLS was carried out in the three individual preprocessed spectra separately. After that, the first scores of each yielded PLS from calibration set were merged and centered, resulting in a single matrix. A new PLS and cross validation were executed sequentially in this new matrix. To reduce overfitting chances, the number of latent variables of the final model was limited to 10. This approach was called PLS mid-level data fusion. Finally, all models obtained from data fusion and non-fusion were evaluated and compared by their RMSEP parameter, using the randomization test [28]. Fig. 1 shows a scheme of the data fusion methodology used in this paper and also shows the high-level data fusion. The accuracy of models was compared by random test for accuracy using as decision a 5% significance level.

0.0044 to 0.96 wt%. According to Riazi’s [29] classification, most of these samples are considered sweet oils because their sulphur content is less than 0.5 wt%. The others are termed as sour oil, however, the maximum sulphur content in those samples (1 wt%) is low compared to certain heavy oils, in which sulphur may reach 6 wt%. Nitrogen compounds bring several deleterious effects on oil processing. Basic nitrogen compounds, specially, can react and deactivate the acid sites of catalysts. TN content is usually lower than sulphur content in crude oils, varying from 0.01 to 0.9 wt% [29]. According to Prado et al., almost 90% of crude oils have TN content lower than 0.25 wt% being, classified as nitrogen-poor oils [30]. Crude oils with higher content are denominated high-nitrogen oils. As shown in Fig. 2b, the oil samples used in this paper presented total nitrogen content ranging from 0.00087 to 0.57 wt% and most of them are considered high-nitrogen oils. Besides TN, BN was also measured and presented a content ranging from 0.0005 to 0.21 wt% so most samples had an average value equal to 0.1 wt%, as shown in Fig. 2c. To avoid corrosion problems in the refining process, TAN lower than 0.5 mg KOH·g−1 is recommended. According to Fig. 2d, total acid number varied from 0.030 to 4.96 mg KOH·g−1 and most samples presented low values of TAN, being considered non-acid oils. The remaining samples can be classified as acid oils, because TAN is higher than 0.5 mg KOH·g−1. According to Woods et al., conventional oils (non-heavy oils) have higher amounts of saturated compounds and smaller amounts of polar fraction [31]. The oil samples used in this paper presented a saturated fraction ranging from 36.6 to 90.6 wt%, aromatic fraction from 7.0 to 37.2 wt% and polar fraction from 0.0 to 43.5 wt%. Fig. 2e–g show that the major constituent of oil samples is saturated compounds, which are constituted by simple chains. The second largest constituent are aromatic compounds and, to a lesser extent, polar compounds. For some samples, polar content reaches zero percent.

3. Results and discussion Fig. 2 shows histograms with distribution of the physicochemical properties measured by standard methods for all oil samples used in this study. Total S content is one of the most important characteristics that determine the quality of crude oil and, consequently, its commercial value. Lower sulphur content implies better quality of a crude oil. The oil samples used in this paper presented total S content ranging from

3.1. Models from individual spectral data sets Table 1 presents the main modelling results from the three

Fig. 1. Scheme of data fusion methodology at low-, PCA mid- and PLS mid-levels. 3

Fuel xxx (xxxx) xxxx

M.K. Moro, et al.

Fig. 3. Correlation plots between reference and predicted values of the PLS models for S, TN, BN and TAN. Graphics on left are based on IR data modelling and graphics on right are based on PLS mid-level fused FTIR plus 1H NMR data modelling. Legend: circles – calibration samples and triangles – training samples.

Fig. 2. Histogram of properties for the crude oil samples used in the multivariate calibration.

individual data sets, FTIR, 1H NMR and 13C NMR (no fusion). Visually, the three analytical techniques presented similar performance to predict properties. The randomization test [28] was applied in the

Table 1 Statistical parameters of the models based on individual spectra and based on low-, PCA mid- and PLS mid-level data fusion. Sulphur RMSEP No fusion Low-level

PCA mid-level

PLS mid-level

a b

FTIR 1 H NMR 13 C NMR FTIR + 1H NMR FTIR + 13C NMR 1 H NMR + 13C NMR FTIR + 1H NMR + 13C NMR FTIR + 1H NMR FTIR + 13C NMR 1 H NMR + 13C NMR FTIR + 1H NMR + 13C NMR FTIR + 1H NMR FTIR + 13C NMR 1 H NMR + 13C NMR FTIR + 1H NMR + 13C NMR

0.103 0.089 0.081 0.096 0.092 0.083 0.090 0.097 0.094 0.114 0.094 0.071 0.076 0.076 0.064

a

Rp

Total nitrogen 2

0.67 0.76 0.80 0.72 0.74 0.79 0.75 0.72 0.73 0.64 0.74 0.85 0.82 0.83 0.87

2

RMSEPa

Rp

0.052 0.049 0.073 0.049 0.066 0.055 0.061 0.061 0.075 0.053 0.065 0.049 0.054 0.054 0.050

0.76 0.77 0.59 0.77 0.65 0.72 0.69 0.68 0.65 0.72 0.68 0.78 0.72 0.73 0.76

Basic nitrogen RMSEP

a

0.012 0.010 0.020 0.009 0.016 0.021 0.017 0.014 0.016 0.022 0.018 0.007 0.014 0.012 0.0092

RMSEP values in %wt. RMSEP values in mgKOH/g. 4

2

Total acid number

Rp

RMSEP

0.93 0.95 0.77 0.96 0.84 0.75 0.82 0.88 0.85 0.74 0.83 0.98 0.89 0.92 0.95

0.26 0.29 0.30 0.27 0.26 0.27 0.25 0.26 0.39 0.21 0.27 0.22 0.16 0.31 0.22

b

Rp

2

0.80 0.72 0.69 0.78 0.76 0.75 0.79 0.81 0.58 0.90 0.86 0.82 0.91 0.69 0.83

Saturated RMSEP 5.82 5.52 6.69 5.43 6.13 5.97 5.43 5.12 5.79 6.05 5.25 5.65 5.99 5.54 5.34

a

Rp

Aromatic 2

0.76 0.79 0.67 0.79 0.73 0.74 0.79 0.85 0.78 0.74 0.82 0.77 0.74 0.78 0.79

RMSEP 4.32 4.65 4.26 4.31 4.12 4.04 3.88 3.96 3.77 4.31 3.83 4.55 3.66 4.14 4.08

a

Rp

Polar 2

0.57 0.51 0.59 0.56 0.60 0.62 0.63 0.61 0.63 0.55 0.63 0.50 0.67 0.60 0.60

RMSEPa

Rp2

7.42 7.85 7.34 8.27 7.71 7.56 7.54 7.43 7.24 6.60 7.29 6.62 7.11 6.58 6.73

0.54 0.52 0.56 0.51 0.50 0.56 0.57 0.55 0.56 0.65 0.58 0.65 0.59 0.63 0.64

Fuel xxx (xxxx) xxxx

M.K. Moro, et al.

Table 2 Number of latent variables used on fusion of the individual models. Property/Spectrum IR 1 H NMR 13 C NMR

S

TN

BN

TAN

SAT

ARO

POL

12 12 12

5 8 6

10 10 10

20 20 5

10 10 5

10 10 10

10 5 5

Furthermore, NMR matrices present greater variance than the FTIR data set. Low- and PCA mid-level fusions were unable to capture the complementarity and synergism between the multispectral data (Table 1). Peinder et al. obtained similar conclusions when using infrared, 1H NMR and 13C NMR fused at low- and mid-level to estimate seven crude oil properties. Their models based on fused spectra did not lead to significant improvements, compared to the individual spectra [8]. 3.3. PLS mid-level fusion modelling The main results of PLS mid-level modelling are presented in Table 1. Contrasting with other strategies, this strategy has reduced the prediction error of models. The four spectra combinations, FTIR + 1H NMR, FTIR + 13C NMR, 1H NMR + 13C NMR, and FTIR + 1H NMR + 13C NMR, usually provided more accurate estimations than those from individual spectra. The number of latent variables used on data fusion was defined searching for the best fitted fused model and is presented on Table 2. For comparison purposes, Figs. 3 and 4 also present (graphs on the right) experimental versus predicted data for the best models (marked with a square in Table 1) based on PLS mid-level fusion. Models from mid-level fusion usually presented better linearity with higher coefficients of determination for calibration and prediction sets. The RMSEP parameter reduced about 21.0, 0.0, 30.0, 58.4, 3.2, 14.0 and 10.3% for S, TN, BN, TAN, SAT, ARO and POL, respectively, from the non-fused to the PLS mid-level fused models. This shows the power of PLS mid-level fusion in multivariate calibration to increase the accuracy of models. Fig. 5 shows the ratio between RMSEP of the best models using low-, PCA mid- and PLS mid-level fusion and the RMSEP of the best individual models. Except for total nitrogen, PLS mid-level fusion reduced the RMSEP value, while low- and PCA mid-level fusion provided no improvement in accuracy of models in most cases. The randomization test [28] was applied in the RMSEP values to compare the accuracy of the best fused and non-fused models that were represented on graphics of Figs. 3 and 4. At a significance level of 0.05, the results were 0.011, 0.993, 0.302, 0.025, 0.661, 0.048 and 0.011 for S, TN, BN, TAN, SAT, ARO and POL, respectively. The test showed that, for S, TAN, ARO and POL, the accuracy of PLS mid-level models are better and differs significantly from accuracy of non-fused models (pvalue lower than 0.05). In contrast, for TN, BN and SAT, the non-fused and PLS mid-level fused models present significantly the same RMSEP values, therefore the fusion strategy did not improvement models for those properties. For a more detailed analysis and for comparison, Table 3 also presents the main statistical parameters of the best fitted no fusion models and Table 4 of the best fitted PLS mid-level fusion models for each property. Expect for aromatics, the RMSEC and RMSEP values are very similar, therefore, there is no overfitting of data for both non-fused and fused models. In this paper, overfitting data happens when RMSEP > 3·RMSEC. In addition to improve the predictive capacity, PLS mid-level fusion reduced the number of variables from original matrices to a small number varying from only 2 to 4 latent variables, as shown in Table 4, producing models with relative simplicity. While this, non-fused models used from 4 to 8 latent variables (Table 3). After calibration and prediction steps, prediction residues were

Fig. 4. Correlation plots between reference and predicted values of the PLS models for SAT, ARO and POL. Graphics on left are based on IR data modelling and graphics on right are based on PLS mid-level fused FTIR plus 1H NMR data modelling. Legend: circles – calibration samples and triangles – training samples.

prediction data set to compare the predictive accuracy of these models, for each property. The test indicated no significant differences in the accuracy of FTIR, 1H NMR and 13C NMR models for all properties, except for basic nitrogen, in which the RMSEP value of the 13C NMR model differed significantly from FTIR (p-value = 0.015) and 1H NMR (p-value = 0.0002) models, presenting worse performance. Thus, in general, this study showed that the three data sources have similar performance when individually modeled. Fig. 3a, c, e and g and Fig. 4a, c and e present experimental versus predicted data for the models from individual FTIR spectra (no-fusion), for each property. The models chosen to build those graphs are marked with a square in Table 1. S, BN and TAN results showed a good linearity, expressed by higher values of the coefficient of determination. Lower coefficient of determination values was achieved (lower than 0.8) for total TN, SAT, ARO and POL, showing a lower linearity in graphs. 3.2. Low- and PCA mid-level fusion modelling The main results of low- and mid-level with PCA modelling are presented in Table 1, for each physicochemical property. In most cases, low- and PCA mid-level fusions brought no significant improvements in accuracy. Oppositely, most models presented worse performance when compared to models from individual spectra. Poor performance of lowlevel fusion can be explained by several disadvantages of this strategy. According to Roussel et al., a simple concatenation of noisy and correlated data may obtain worse results, instead of improving them [32]. Moreover, according to Borràs et al., when data sets present different dimensions and variance between each other, there may be a predominance of one data source over the other [5]. The FTIR, 1H NMR and 13C NMR matrices used in this paper presented 3351, 20,433 and 65,533 variables, respectively. This discrepancy between dimensions of the matrices may have contributed to the failure of low-level fusion. 5

Fuel xxx (xxxx) xxxx

M.K. Moro, et al.

Fig. 5. Root mean squared of prediction of low-, PCA mid- and PLS mid-level fusion models from FTIR plus 13C NMR data in relation to the RMSEP of the individual model from 13C NMR data (RMSEP/RMSEP13C NMR).

evaluated, by statistical tests. Tables 3 and 4 present the results of the bias test (tcal) that evaluated systematic errors in each model. Since tcal was lower than critical value (2.06), there is no evidence of systematic error in all models. The results for non-parametric permutation test, using linear and quadratic functions to describe the test set residues. The non-parametric permutation test used a quadratic function to describe the test set residues. Since quadratic evaluation are not significant at significance level of 0.05; the non-fused models for S, BN, SAT and POL (Table 3) and the fused models for S, ARO and POL (Table 4) showed no trends, indicating that residues are randomly distributed around zero. PLS mid-level fusion models for total S, TAN, SAT, ARO and POL predictions are more accurate than models found in literature. To estimate sulphur content, Peinder et al. obtained a RMSEP equal to 0.25 wt% using FTIR spectra associated with PLS, while the minimum achieved in this work was 0.0258 wt% [8]. Barbosa et al. also predicted total S content with a RMSEP equal to 0.51 wt% using low field nuclear magnetic resonance method associated with multiple linear regression [32]. For TAN determination, a much lower value of RMSEP was reached in this work (0.164 mgKOH·g−1), compared with the results of Barbosa et al. whose lowest RMSEP was 0.3 mg KOH·g−1 [33]. While we reached RMSEP values of 5.34 and 6.57 wt% for saturated and polar, using data fusion, Filgueiras et al. [34] reached 4.4 and 3.7 wt%, respectively, using one analytical source (13C NMR). However, for aromatic content, we reached a more accurate model with RMSEP equal 3.65 (Table 3) while Filgueiras et al. [34] reached 4.3 wt%. Other researcher also obtained less accurate models, using non-fused data: Rodrigues et al. obtained RMSEP values equal to 6.7 and 4.05 wt% for SAT and ARO contents, respectively, from high temperature gas chromatography data [35]. We found no studies that estimate total N content in crude oils, in

the literature. To predict BN, the lower RMSEP value reached was 0.007 wt%. Meanwhile, the best model for basic nitrogen, obtained by Duarte et al., provided a higher RMSEP equal to 0.009 wt%, using 1H NMR and PLS regression [36]. In turn, Terra et al. reached more accurate models, with a RMSEP equal 0.0012 wt%, using PLS with variable selection in mass spectroscopy data. The authors, however, used 70 samples ranging from 0.0011 to 0.17 wt% of BN contents [37]. The model developed in this paper involves a larger variety of BN contents (0.0005 to 0.26 wt%) and a total of 126 samples, which makes it a more robust model. The best results were achieved using data fusion and the PLS algorithm, which is a linear method and, therefore, is rapid and simple. It was not necessary to use variable selection and/or nonlinear methods that are often time consuming and require high computational processing. 3.4. Variable analysis Fig. 6a shows FTIR spectra of all petroleum samples. Peaks near 2900 and 1450 cm−1 are respectively due to the CeH stretching and deformation in aliphatic chains. The medium intensity signal, near 1375 cm−1 region, corresponds to the vibrations of methyl groups attached to carbon atoms. The region between 1200 and 650 cm−1, known as fingerprint region, presents a mixture of signals relating to vibrations of aliphatic and aromatic chains and also presents signal of some heteroatoms like sulfur, nitrogen, fluorine and chlorine [38]. Clustering 1H RMN spectra of all petroleum samples is shown in Fig. 7a. According to Masili et al. [39], the signals between 0.5 and 1.0 ppm refer to CH3 in aliphatic chains and the high intensity peak at 1.2–1.3 ppm is due to CH2 in aliphatic chains. Signals between 1.5 and 1.6 ppm refer to β-aromatic CH2. The highlight region between 2.0 and 2.5 ppm refer to CH3 α-aromatics and between 7.2 and 7.4 ppm, to di-

Table 3 Main statistical parameters for the best no fusion models. Property/Parameters Spectra RC2 RP2 RMSEC RMSEP RMSEP (%) LVs tcal p-value quadratic a

S 13

C NMR 0.936 0.797 0.0545 0.0811 24.03 6 0.657 0.478

TN 1

H NMR 0.951 0.773 0.0302 0.0485 17.77 8 0.0208 0.039

BN

TAN

1

H NMR 0.948 0.947 0.0108 0.0096 10.34 6 0.363 0.215

Values in %wt. bValues in mgKOH/g. 6

FTIR 0.910 0.800 0.279 0.255 58.58 5 0.120 0.0

SAT 1

H NMR 0.807 0.787 6.249 5.522 9.83 4 0.501 0.349

ARO 13

C NMR 0.930 0.589 1.805 4.255 18.55 6 0.142 0.006

POL 13

C NMR 0.838 0.555 3.505 7.337 36.86 5 1.003 0.131

Fuel xxx (xxxx) xxxx

M.K. Moro, et al.

Table 4 Main statistical parameters for the best PLS mid-level fusion models. Property/ Parameters Spectra RC2 RP2 RMSEC RMSEP RMSEP (%) LVs tcal p-value quadratic a b

S FTIR + 1H NMR + NMR 0.985 0.870 0.0258a 0.0641a 19.0 4 0.305 0.377

13

C

TN

BN

FTIR + 1H NMR

FTIR + 1H NMR

0.967 0.777 0.0241a 0.0486a 17.78 3 0.474 0.0

0.984 0.975 0.0059a 0.0070a 7.51 3 0.394 0.001

TAN FTIR +

13

C NMR

0.987 0.907 0.106b 0.164b 37.57 3 0.0543 0.004

SAT FTIR + 1H NMR + NMR 0.946 0.792 3.223a 5.342a 9.51 2 0.570 0.047

ARO 13

C

FTIR + 1H NMR 0.982 0.665 0.901a 3.658a 15.94 4 0.938 0.124

POL 1

H NMR + 13C NMR 0.834 0.634 3.477a 6.577a 33.04 2 0.494 0.236

Values in %wt. Values in mgKOH/g.

aromatics chains [39]. The solvent, CDCl3, has a peak at 5.3 ppm region, where there is no sample signal and thus, no overlapping problems. The solvent signal can be used to verify the peaks alignment between the samples spectra. Fig. 8a shows clustering 13C NMR spectra of all petroleum samples. The peak slightly below 80 ppm corresponds to the signal of the solvent CDCl3 used in the procedure. In this region there is no sample signal, so

there are no overlapping problems between solvent and sample signals. The region until 40 ppm corresponds mainly to aromatic carbons. The region below 40 ppm corresponds mainly to aliphatic chain carbons and the highlight region between 120 and 140 ppm corresponds mainly to aromatic carbons. In a linear model, the variables with the highest coefficients present the most contribution in the models and, thus, the highest correlation

Fig. 6. (A) FTIR spectra of all oil samples. Clustering regression coefficients of the firsts latent variables of FTIR individual models for saturated (B), aromatic (C) and polar (D) prediction. 7

Fuel xxx (xxxx) xxxx

M.K. Moro, et al.

Fig. 7. (A) 1H NMR spectra of all oil samples. Clustering regression coefficients of the firsts latent variables of 1H NMR individual models for saturated (B), aromatic (C) and polar (D) prediction.

with the estimated property. In the meanwhile, variables with coefficients close to zero did not present much importance in the model. For variables analysis, we evaluated the contribution and importance of the original variables on each individual model. Fig. 6b–d show an overlap of regression coefficients of firsts latent variables (used on data fusion) of individual models built from FTIR data, for saturated, aromatic and polar prediction, respectively. Figs. 7b–d and 8b–d show the same, but for models from 1H NMR and 13C NMR data, respectively. Observing the same data source (FTIR, 1H NMR or 13C NMR), all the three graphs have similar profiles, so that, the same variables are important for saturated, aromatic and polar predictions. Besides, the most important variables are from spectral regions with analytical signals. This proves that these properties are well correlated with the three spectra and the models have been constructed based on this correlation. Minero et al. [40] also took advantage of this and, from 1H and 13C NMR analyses, develop correlations to predict the SARA concentration of crude oils, obtaining good accuracy.

mid-level data fusion was used to build the calibration models. Subsequently, prediction performance of all models were compared. The individual technique is less effective and limited in estimating some oil properties that fusion data. Low- and PCA mid-level fusions could not produce models with better accuracy compared to the models from individual technique; therefore, the use of this strategy was not justified for this case. However, the PLS mid-level data fusion provided models with better prediction performance for S, TAN, ARO and POL models, reaching RMSEP values lower than models from individual spectra. Therefore, PLS mid-level data fusion proved to be the best fusion strategy in this paper. Results proved that multispectral data can complement each other and act synergistically to set up a correlation between the spectral signals and the estimated property. As a consequence, this study demonstrated that data fusion from FTIR, 1H NMR and 13C NMR techniques associated with the PLS calibration algorithm can be a fast, efficient and accurate method to estimate certain types of physicochemical properties of crude oil.

4. Conclusions

CRediT authorship contribution statement

In this study, FTIR, 1H NMR and 13C NMR spectroscopy methods were applied to predict seven properties of crude oil samples. Partial least squares regression method associated with low-, PCA mid- and PLS

Mariana K. Moro: Conceptualization, Data curation, Writing original draft. Álvaro C. Neto: Writing - original draft. Valdemar 8

Fuel xxx (xxxx) xxxx

M.K. Moro, et al.

Fig. 8. (A) 13C NMR spectra of all oil samples. Clustering regression coefficients of the firsts latent variables of 13C NMR individual models for saturated (B), aromatic (C) and polar (D) prediction.

Lacerda: Writing - original draft. Wanderson Romão: Writing - original draft. Luiz S. Chinelatto: Writing - original draft. Eustáquio V.R. Castro: Conceptualization. Paulo R. Filgueiras: Conceptualization, Writing - original draft.

Appendix A. Supplementary data

Declaration of Competing Interest

References

Supplementary data to this article can be found online at https:// doi.org/10.1016/j.fuel.2019.116721.

[1] Brereton RG. Introduction to multivariate calibration in analytical chemistry. Analyst 2000;125:2125–54. https://doi.org/10.1039/b003805i. [2] Castanedo F. A review of data fusion techniques. ScientificWorldJournal 2013. https://doi.org/10.1155/2013/704504. [3] Bevilacqua M, Marini F, Rinnan A, Rasmussen MA, Skov T. Recent chemometrics advances for Foodomics. Trends Anal Chem 2017;96:42–51. https://doi.org/10. 1016/j.trac.2017.08.011. [4] Steinmetz V, Sévila F, Bellon-Maurell V. A methodology for sensor fusion design: application to fruit quality assessment. J Agr Eng Res 1999;74:21–31. https://doi. org/10.1006/jaer.1999.0428. [5] Borràs E, Ferre J, Boque R, Mestres M, Acena L, Busto O. Data fusion methodologies for food and beverage authentication and quality assessment – A review. Anal Chim Acta 2015;891:1–14. https://doi.org/10.1016/j.aca.2015.04.042. [6] Marquez C, López MI, Ruisánchez I, Callao MP. FT-Raman and NIR spectroscopy data fusion strategy for multivariate qualitative analysis of food fraud. Talanta 2016;161:80–6. https://doi.org/10.1016/j.talanta.2016.08.003. [7] Li Y, Zhang JY, Wang YZ. FT-MIR and NIR spectral data fusion: a synergetic strategy for the geographical traceability of Panax notoginseng. Anal Bioanal Chem 2018;410:91–103. https://doi.org/10.1007/s00216-017-0692-0.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements The authors would like to thank LABPETRO-UFES and Centro de Pesquisas e Desenvolvimento Leopoldo Américo Miguez de Mello (Cenpes – Petrobras) for providing the crude oil sample; Fundação de Amparo à Pesquisa e Inovação do Espírito Santo (FAPES) [33530.503.20537.12092017] and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) [process number 422515/2016-7] for their financial support and NCPQ/UFES analysis. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001. 9

Fuel xxx (xxxx) xxxx

M.K. Moro, et al. [8] Peinder P, Visser T, Petrauskas DD, Salvatori F, Soulimani F, Weckhuysen BM. Partial least squares modeling of combined infrared, 1H NMR and 13C NMR spectra to predict long residue properties of crude oils. Vib Spectrosc 2009;51:205–12. https://doi.org/10.1016/j.vibspec.2009.04.009. [9] Bajoub A, Medina-Rodríguez S, Gómez-Romero M, Ajal E, Bagur-González MG, Fernández-Gutiérrez A, et al. Assessing the varietal origin of extra-virgin olive oil using liquid chromatography fingerprints of phenolic compound, data fusion and chemometrics. Food Chem 2017;215:245–55. https://doi.org/10.1016/j.foodchem. 2016.07.140. [10] Obisesan KA, Jiménez-Carvelo AM, Cuadros-Rodriguez L, Ruisánchez I, Callao MP. HPLC-UV and HPLC-CAD chromatographic data fusion for the authentication of the geographical origin of palm oil. Talanta 2017;170:413–8. https://doi.org/10.1016/ j.talanta.2017.04.035. [11] Cuevas FJ, Pereira-Caro G, Moreno-Rojas JM, Muñoz-Redondo JM, Ruiz-Moreno MJ. Assessment of premium organic orange juices authenticity using HPLC-HR-MS and HS-SPME-GC-MS combining data fusion and chemometrics. Food Control 2017;82:203–11. https://doi.org/10.1016/j.foodcont.2017.06.031. [12] Tan J, Li R, Jiang Z. Chemometric classification of Chinese lager beers according to manufacturer based on data fusion of fluorescence, UV and visible spectroscopies. Food Chem 2015;184:30–6. https://doi.org/10.1016/j.foodchem.2015.03.085. [13] Apetrei IM, Rodríguez-Méndez ML, Apetrei C, Nevares I, Alamo M, Saja JA. Monitoring of evolution during red wine aging in oak barrels and alternative method by means of an electronic panel test. Food Res Int 2012;45:244–9. https:// doi.org/10.1016/j.foodres.2011.10.034. [14] Zude M, Herold B, Roger JM, Maurel VB, Landahl S. Non-destructive tests on the prediction of apple fruit flesh firmness and soluble solids content on tree and in shelf life. J Food Eng 2006;77:254–60. https://doi.org/10.1016/j.jfoodeng.2005. 06.027. [15] Mendoza F, Lu R, Cen H. Comparison and fusion of four nondestructive sensors for predicting apple fruit firmness and soluble solids content. Postharvest Biol Tec 2012;73:89–98. https://doi.org/10.1016/j.postharvbio.2012.05.012. [16] Xiaobo Z, Jiewen Z. Apple quality assessment by fusion three sensors. Agricultural Product Processing Lab, Jiangsu University, Zhenjiang, China, 2013. https://doi. org/10.1109/ICSENS.2005.1597717. [17] Smolinska A, Blanchet L, Coulier L, Ampt KAM, Luider T, Hintzen RQ, et al. Interpretation and visualization of non-linear data fusion in kernel space: study on metabolomic characterization of progression of multiple sclerosis. PLoS ONE 2012;7:e28163https://doi.org/10.1371/journal.pone.0038163. [18] AMERICAN PETROLEUM INSTITUTE. API. https://www.api.org/. [19] ASTM D4294-16. Standard test method for sulfur in petroleum and petroleum products by energy dispersive X-ray fluorescence spectrometry. West Conshohocken, PA: ASTM International; 2016. https://doi.org/10.1520/D4294-16. [20] ASTM D4629-17. Standard test method for trace nitrogen in liquid hydrocarbons by syringe/inlet oxidative combustion and chemiluminescence detection. West Conshohocken, PA: ASTM International; 2017. https://doi.org/10.1520/D4629-17. [21] UOP Method 269-10. Nitrogen Bases in Hydrocarbons by Potentiometric Titration. West Conshohocken, PA, USA: American Society for Testing and Materials; 2010. [22] ASTM D664-04. Standard test method for acid number of petroleum products by potentiometric titration. West Conshohocken, PA: ASTM International; 2004. https://doi.org/10.1520/D0664-04. [23] ASTM D2549-02. Standard test method for separation of representative aromatics and nonaromatics fractions of high-boiling oils by elution chromatography. West Conshohocken, PA: ASTM International; 2017. https://doi.org/10.1520/D254902R17.

[24] Liang C, Yuan H-F, Zhao Z, Song C-F, Wang J-J. A new multivariate calibration model transfer method of near-infrared spectral analysis. Chemom Intell Lab Syst 2016;153:51–7. https://doi.org/10.1016/j.chemolab.2016.01.017. [25] Tomasi G, Savorani F, Engelsen SB. Icoshift: an effective tool for the alignment of chromatographic data. J Chromatogr A 2012;1218:7832–40. https://doi.org/10. 1016/j.chroma.2011.08.086. [26] Filgueiras PR, Sad CMS, Loureiro AR, Santos MFP, Castro EVR, Dias JCM, et al. Determination of API gravity, kinematic viscosity and water content in petroleum by ATR-FTIR spectroscopy and multivariate calibration. Fuel 2014;116:123–30. https://doi.org/10.1016/j.chemolab.2014.02.002. [27] Filgueiras PR, Alves JCL, Sad CMS, Castro EVR, Dias JCM, Poppi RJ. Evaluation of trends in residuals of multivariate calibration models by permutation test. Chemom Intell Lab Syst 2014;133:33–41. https://doi.org/10.1016/j.chemolab.2014.02.002. [28] Voet HVD. Comparing the predictive accuracy of models using a simple randomization test. Chemom Intell Lab Syst 1994;25:313–23. https://doi.org/10.1016/ 0169-7439(94)85050-X. [29] Riazi MR. Characterization and properties of petroleum fractions. ASTM Stock Number: MNL50; 2005. http://dx.doi.org/10.1520/MNL50-EB. [30] Prado GHC, Rao Y, Klerk A. Nitrogen removal from oil: a review. Energ Fuels 2017;31:14–36. https://doi.org/10.1021/acs.energyfuels.6b02779. [31] Woods J, Kung J, Kingston D, Kotlyar L, Sparks D, McCracken T. Canadian crudes: a comparative study of SARA fractions from a modified HPLC separation technique. Oil Gas Sci Technol 2008;1:151–63. https://doi.org/10.2516/ogst:2007080. [32] Rouseel S, Bellon-Maurel V, Roger JM, Grenier P. Autenticating white grape must variety with classification models based on aroma sensors, FT-IR and UV spectrometry. J Food Eng 2003;60:407–19. https://doi.org/10.1016/S0260-8774(03) 00064-5. [33] Barbosa LL, Sad CMS, Morgan VG, Filgueiras PR, Castro ERV. Application of low field NMR as an alternative technique to quantification of total acid number and sulphur content in petroleum from Brazilian reservoirs. Fuel 2016;176:146–52. https://doi.org/10.1016/j.fuel.2016.02.085. [34] Filgueiras PR, Portela NA, Silva SRC, Castro EVR, Oliveira LMSL, Dias JCM, et al. Determination of saturates, aromatics, and polars in crude Oil by 13C NMR and support vector regression with variable selection by genetic algorithm. Fuel 2016;30:1972–8. https://doi.org/10.1021/acs.energyfuels.5b02377. [35] Rodrigues RRT, Rochaa JTC, Oliveira LMSL, Dias JCM, Müller EI, Castro EVR, et al. Evaluation of calibration transfer methods using the ATR-FTIR technique to predict density of crude oil. Chemom Intell Lab Syst 2017;166:7–13. https://doi.org/10. 1016/j.chemolab.2017.04.007. [36] Duarte LM, Filgueiras PR, Silva SRC, Dias JCM, Oliveira LMSL, Castro EVR, et al. Determination of some physicochemical properties in Brazilian crude oil by 1H NMR spectroscopy associated to chemometric approach. Fuel 2016;181:660–9. https://doi.org/10.1016/j.fuel.2016.05.049. [37] Terra LA, Filgueiras PR, Tose LV, Romão W, Castro EVR, Oliveira LMSL, et al. Laser desorption ionization FT-ICR mass spectrometry and CARSPLS for predicting basic nitrogen and aromatics contents in crude oils. Fuel 2015;160:274–81. https://doi. org/10.1016/j.fuel.2015.07.099. [38] Pavia DL, Lampman GM, Kriz GS, Vyvyan JR. Introduction to spectroscopy. 4th ed. Cengage Learning; 2008. [39] Masili A, Puligheddu S, Sassu L, Scanoa P, Lai A. Prediction of physical–chemical properties of crude oils by 1H NMR analysis of neat samples and chemometrics. Magn Reson Chem 2012;50:729–38. [40] Minero FS, Anchieta J, Oliver GS, Flores S. Predicting SARA composition of crude oil by means of NMR. Fuel 2013;110:318–21.

10