Improving accuracy and reproducibility of vibrational spectra for diluted solutions

Improving accuracy and reproducibility of vibrational spectra for diluted solutions

Analytica Chimica Acta xxx (2017) 1e12 Contents lists available at ScienceDirect Analytica Chimica Acta journal homepage: www.elsevier.com/locate/ac...

3MB Sizes 0 Downloads 51 Views

Analytica Chimica Acta xxx (2017) 1e12

Contents lists available at ScienceDirect

Analytica Chimica Acta journal homepage: www.elsevier.com/locate/aca

Improving accuracy and reproducibility of vibrational spectra for diluted solutions Dusan Koji c a, b, *, Roumiana Tsenkova b, c, Masato Yasui a, b a

Department of Pharmacology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan Keio Advanced Research Center for Water Biology and Medicine, Keio University, Mita 2-15-45, Minato-ku, Tokyo, 108-8345, Japan c Department of Agricultural Engineering and Socio-Economics, Kobe University, 1-1 Rokkodai, Nada-ku, Kobe, 657-8501, Japan b

h i g h l i g h t s

g r a p h i c a l a b s t r a c t

 Fast, simple, and accurate removal of bulk solvent spectra is proposed.  Increased accuracy in moderate and low concentrations is demonstrated.  Results are applicable to other spectroscopic methods and other solvents.

a r t i c l e i n f o

a b s t r a c t

Article history: Received 24 May 2016 Received in revised form 1 December 2016 Accepted 14 December 2016 Available online xxx

In what appears to be a trivial operation in which the averaged spectrum of solvent is subtracted from the spectra of solutions, can be a misleading step in improving reproducibility of vibrational spectra. Near-infrared spectra of pure water and glycine solutions were used to quantify instrumental and spectral variations, and examine its influence on the reproducibility of difference spectra over a wide concentration range. Significant improvements were observed (fourfold), in comparison with the most commonly applied technique that uses an averaged spectrum of solvent. We propose a new technique, in which subtraction of the closest spectrum of solvent involves calculating the smallest area under the subtracted curve, to extract the optimal outcome. These results reveal that, contrary to common practice, reproducibility for spectra of diluted solutions can bypass even instrumental baseline shifts and render results that are limited only by the noise originating from the instrument's sensor. © 2016 Elsevier B.V. All rights reserved.

Keywords: Water Hydrogen bonding Spectral pre-processing Spectral subtraction Near infrared spectroscopy

1. Introduction Stable performance of liquid state spectroscopy in building accurate predictive models and elucidating solute-solvent interactions relies on reproducible identification and isolation of

* Corresponding author. Department of Pharmacology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan. E-mail address: [email protected] (D. Koji c).

solute-induced spectral features. Near-infrared spectra are composed of numerous, intrinsically wide and overlapping bands whose extraction is additionally screened by the strong background signal arising from bulk solvent, which is particularly true in the case of aqueous solutions. Visualization of these bands necessitates employment of spectral pre-processing methods whose approaches are based on either (1) derivatives, scatter corrections, and statistically based normalizations, or (2) subtraction of the solvent spectrum. Over the past few decades, the first group of methods received greater attention and demonstrated more stable

http://dx.doi.org/10.1016/j.aca.2016.12.019 0003-2670/© 2016 Elsevier B.V. All rights reserved.

Please cite this article in press as: D. Kojic, et al., Improving accuracy and reproducibility of vibrational spectra for diluted solutions, Analytica Chimica Acta (2017), http://dx.doi.org/10.1016/j.aca.2016.12.019

2

D. Kojic et al. / Analytica Chimica Acta xxx (2017) 1e12

improvements due to their relatively small dependence on the type of measurement instrument used [1e3], whereas subtraction techniques remained less significant due to slight customization to the employed spectral range [4e9]. Nevertheless, if the following guidelines are implemented, both approaches are believed to guarantee stable results, even for diluted solutions: (1) increased sampling, (2) increased acquisition of consecutive spectra, (3) sufficient repetition of experiments and (4) sufficient control of environmental conditions. Even in the absence of temperature induced changes, these guidelines can be severely challenged by at least four processes:

wider range of spectra than required for this study; as such, it is quite large but also reusable and beneficial in increasing efficiency in long-term use; additionally, it's size can be optimized to reduce the overall computational cost (as pointed out later in this paper). Additional information on the library is provided in the Materials and Methods section of this paper. We begin the report by analyzing each source of variation and, comparing its influence on the results obtained from classic and new methods. 2. Materials and methods 2.1. Sample preparation

1. Solvent displacement: increase in solute content displaces solvent molecules, which in turn lowers absorbance and increases the distance between solvent and solute spectra 2. Force-field changes: replacement of water with solute molecules changes the distribution of hydrogen bonds (HB) from solventsolvent to solvent-solute HBs, which generates red/blue shifts and further separates spectra of solvent and solute 3. Concentration gradient: solute-induced band intensities decay with dilution, which makes them more difficult to detect 4. Instrumental baseline shifting and wavelength inaccuracy provide additional sources of variation, independent of solute and solvent, which destabilize the subtraction outcome This report examines the ability of solvent subtraction techniques to respond to all of these challenges. Initial approaches to subtraction in the near-infrared region recommended averaging the solvent spectra and subtracting it from the (averaged) spectra of solutions. Increasingly sophisticated approaches, conducted in the mid-infrared (mid-IR) region, included: 1. Corrections of solvent spectrum by repeated scaling/shifting until some measure of accuracy achieves sufficiently low values [4,5], 2. Analytical models that utilize previously corrected spectra and are calibrated using specific bands in the mid-IR region, which was a choice motivated by the analysis of certain solutes (proteins [4,5] or salts [6e9]), 3. HDO as an alternative solvent, used to isolate and precisely characterize the OeH oscillator [6,7,9], 4. Molecular dynamics simulations to refine the accuracy of analytical models [9]. This group of methods mainly focused on responding to processes previously described and labeled as 1 and 3. Since wavelength shifts (process 2) and instrumental error (process 4) were not explicitly treated we decided to extend the previous analyses and estimate levels of variability in the spectra of pure solvent and solutions (processes 2 and 4), in order to better understand their influence on the accuracy of subtraction procedure. This report also focuses on our new approach to spectral subtraction and its comparison with the simplest classical method, which uses an averaged spectrum of solvent. Our technique superiorly suppresses all aforementioned detrimental influences without using averaged spectra, scaling/shifting, or any kind of solute-based calibration. It does so by subtracting each consecutive spectrum of solvent from each consecutive spectrum of solution. This subtraction is performed by creating all possible pairs of differences (solution e pure solvent), and locating the closest pair by selecting the difference spectrum with the smallest area under the curve. Regarding the choice of pure solvent spectra, we note two possibilities: (1) using spectra from current experiments, and (2) generating a library of solvent spectra. The library that we used was constructed to account for any solute, which necessarily covers a

Powdered formulation of glycine (Wako Pure chemical Industries, Ltd.) was dissolved in redistilled water (18.5 MUcm at 25  C, Milli-Q, Millipore, USA). Solutions were prepared in concentration steps of 1000, 500, 250, 125, 62, 31, 15, 7 mM, by using serial dilutions. Absorption spectra were collected using a Fourier transform near infrared transmission spectrometer (FT-NIR MPA, Bruker Optics, Germany) fitted with a quartz cuvette 1 mm in path length. Spectra were acquired in the range between 12000 and 4000 cm1 (800e2500 nm), using spectral a resolution of 8 cm1. Prior to measurement, samples were incubated at a temperature of 32  C in a thermostated chamber. Measurement compartment was controlled at 32  C using the spectrometer's thermostat. The display accuracy of both devices was ±0.5  C which presents a nonnegligible measurement error in comparison with previously suggested limits of ±0.1  C [10]. Generating a temperature stabilized environment requires introduction of at least three sub-systems for thermal insulation and active control. These sub-systems are: (1) sample incubation and delivery, (2) spectrometer compartment, and (3) water irrigation-based temperature-control system. Therefore, increasing the precision to ±0.1  C will only improve measurement accuracy but not stabilize the temperature of the sample compartment. For all three systems the connection hoses also need to be thermally insulated. The financial and labor costs invested in assembling such a system may be outweighted by instrumental baseline fluctuations (z103a.u.) and sporadic fluctuations in wavelength positions, as shown later in the text, which remain to be the main limiting factors for achieving higher accuracy. 2.2. Experiments A total of 11 experiments were performed. Data from three experiments generated outlying spectra, so the dataset was reduced to 8 experiments. In each experiment, 10 replicate samples were measured for each concentration level and for pure water. For each sample, 10 consecutive spectra were collected, thus generating 100 spectra per experiment at each concentration level, or 8000 spectra for all concentration levels. 2.3. Spectral pretreatment The region below 1100 nm was discarded due to low signal and relatively high noise levels. The region around 1900 nm was also excluded due to overwhelmingly high absorption routinely encountered in aqueous samples. Therefore, focus was placed on the region between 1100 and 1800 nm. In order to minimize the influence of baseline shifting prior to subtraction, raw spectra were pretreated using the standard normal variate (SNV) normalization method [11]. The new method was applied in two slightly different ways. The first utilized the solvent spectra obtained from the current experiments, while the second utilized a library containing a larger number of spectra taken over a relatively wide range of

Please cite this article in press as: D. Kojic, et al., Improving accuracy and reproducibility of vibrational spectra for diluted solutions, Analytica Chimica Acta (2017), http://dx.doi.org/10.1016/j.aca.2016.12.019

D. Kojic et al. / Analytica Chimica Acta xxx (2017) 1e12

3

environmental variations (temperature: 10e80  C; atmospheric pressure: 990e1030 mbar; relative humidity: 30e60%). The distance between all obtainable wavelength positions for the main peak of water in the first overtone region (around 1445 nm) is defined as the spectral library resolution. Due to the inverse relationship between wavelengths and wavenumbers, the resolution increased from 0.46 nm at 1100 nm, until 1.3 nm at 1800 nm (for spectral resolution set to 8 cm1). The slightly unbalanced distribution of 9000 spectra contained in our library had the highest density for room temperature conditions (around 6000 spectra), with the remaining temperature range extending towards 4 and 80  C. The corresponding range of wavelength positions for the first overtone peak was 1445e1454 nm. This range was estimated as sufficient to match the expected wavelength shifts for most solutes. In order to more clearly illustrate the advantages of the new method, precedence will be given to results obtained using this library, as better results were achieved for concentrated solutions. However, negligible differences were found in diluted solutions. Where appropriate, results obtained using the spectra from the current study will also be included and referred to as ‘the experimental closest spectrum’. All calculations were performed using R software [12]. 3. Results Previous analyses of the instrumental accuracy required for accurate reproduction of subtracted absorptions [13,14] yielded criteria too stringent for a great majority of commercially available instruments. It is therefore necessary to accept the influence of instrumental error. This report begins by assessing the instrumental variations in terms of baseline shifting and wavelength accuracy, which will serve as the basis for all other sources of variability. 3.1. The influence of instrumental baseline shifting The instrument used in this study wasn't equipped with components for dry nitrogen purging, therefore the accuracy of available temperature control devices for the sample incubator and sample compartment were relatively low (see the Materials section). Water vapor reproducibility was calculated separately for water vapor absorbing and non-absorbing regions, using the background spectra taken with an empty sample compartment. The region between 833 and 2500 nm contains four regions with vapor bands and is presented in Fig. 1. Absorption significantly intensifies from regions 1 to 4, varying across nearly two orders of magnitude (panels EeH), which present a perfect opportunity to analyze the relationship between band intensities and the extent of instrumental baseline shifting. Results of PCA on non-subtracted background spectra are shown in Fig. 2. In panels IeP, one can see how baseline shifts influence the reproducibility of water bands. The band intensities are lowest in R1 d this region hosts a group of very weak bands that were previously assigned to various combinations of 2n1þn2þn3 [15]. Loadings plots show that nearly two PCs are required to remove the baseline influence and that the third PC finally reconstructs the bands shown in panel E. Somewhat stronger bands are found in region R2 and are assigned to combinations of n1þn2þn3 [15]. The first PC isolates the influence of baseline shifting, while the second PC reveals the shape of the vapor bands. The same situation is observed within region R3. In region R4, the bands are directly reconstructed from the first PC. In conclusion, baseline shifting increasingly obscures spectral features, as the absorption decreases from R4 to R1. The same behavior is anticipated with dilutions. Short wavelength bands are buried in the composite (non-

Fig. 1. Raw and background spectra are shown in the first overtone region. Panel A: raw spectra with main band assignments for pure water. Panel B: full black line is the background spectrum. Four regions (labeled R1 to R4) mark the positions of vapor bands. Gray line is the standard deviation calculated at each wavelength and two prominent regions (R3 and R4) dominate the signal. Dashed gray line represents the same calculation with regions R1-R4 excluded.

subtracted) signal, in the same way that bands in diluted solution fall below the threshold of baseline variations, thus requiring additional PCs to bring out their shapes and intensities. Subtraction techniques are designed to improve this situation and reduce the number of PCs required to isolate the bands. We proceed by calculating baseline variations at arrow-labeled points shown in Fig. 2, panels EeH. In each region three positions were selected. The first position was set before the vapor band region, and the second was set after. The third position marks the strongest vapor band within the region. The results, shown in Fig. 3, panel A, indicate stable, higher values for vapor bands, which are unaffected by baseline shifts in regions R3 and R4. In contrast, regions R1 and R2 have weaker bands whose intensities are comparable with the extent of baseline shifting. Vapor bands were excluded in panel B, suggesting not only that the range of baseline shifting increases with wavelength position, but also that variations between 0.7÷1.4  103 a.u. occur at R3. This range is later compared with the intensities of solute-induced bands in diluted solutions.

3.2. The influence of the light source on pure water When consecutive spectra of pure water are taken, the spectral

Please cite this article in press as: D. Kojic, et al., Improving accuracy and reproducibility of vibrational spectra for diluted solutions, Analytica Chimica Acta (2017), http://dx.doi.org/10.1016/j.aca.2016.12.019

4

D. Kojic et al. / Analytica Chimica Acta xxx (2017) 1e12

Fig. 2. Baseline removal enhances the visualization of vapor bands. Each column of plots describes one band. Each row presents different processing. Panels AeD: raw spectra; dashed lines represent first (R2) and second order (R1, R3 and R4) fitted baselines that were artificially shifted for illustration purposes. Panels EeH: gray lines e vapor bands after baseline subtraction; black overlapped line shows the standard deviation calculated at each wavelength. Arrows show the point at which baseline variation is calculated and summarized (Fig. 3). In general, wavelength accuracy is satisfactory within the limits of spectral resolution that was used. Panels IeL: Loading plots for the first PC. Panels MeO: Loading plots for the second PC. Panel Q: Loading plot for the third PC.

Please cite this article in press as: D. Kojic, et al., Improving accuracy and reproducibility of vibrational spectra for diluted solutions, Analytica Chimica Acta (2017), http://dx.doi.org/10.1016/j.aca.2016.12.019

D. Kojic et al. / Analytica Chimica Acta xxx (2017) 1e12

Fig. 3. Baseline variations are shown without the vapor bands to clarify their range of values. Panel A: baseline variations calculated at arrow-labeled points shown in panels EeH of Fig. 2. Three positions in each region were used: one behind and one after the vapor band region and the third one belonging to the strongest vapor band within the region. Panel B: data from panel A is shown without vapor bands in order to more clearly observe the increase in the range of instrumental baseline shift.

changes observed resemble temperature induced spectral patterns [16e18]. In comparison with the initial (background) spectrum, absorption intensity of the peak at 1412 nm increases, while the peak at 1492 nm decreases. To illustrate this point consecutive spectra of pure water were taken as follows: (1) background spectrum was taken with pure water placed in the sample chamber, (2) 10 consecutive spectra were taken. The results presented in Fig. 4 show how readily pure water responds to light exposure. The first few spectra were unstable, but the absorption values stabilize as acquisition evolves towards the 10th spectrum. Previous work [16e18] showed that rising absorptions at 1412 and decreasing at 1492 nm correspond to blue shifting

5

Fig. 5. Variability in absorption is analyzed at 1412 nm for concentration of 1000 mM. Within Experiment Variability: variation within one experiment is summarized through 0.05, 0.5, and 0.95 quantiles of absorption intensities. Within Consecutive Variability (WCV): we averaged the 1st consecutive spectrum over all samples within one experiment, and the same is repeated for each consecutive spectrum; variation of thus obtained absorption values is summarized using the same quantiles as above. WCV (all Exp.): averages of 1st through 10th consecutive spectra were taken over all experiments and this variation is summarized using the same quantiles as above.

of the first overtone peak of water around 1450 nm. It can be concluded that, relative to the position of the first spectrum, light causes blue shifts in the consecutive spectra. These changes were caused by temperature perturbation of the sample and heating that was induced by the light source. Even though the sample and sample compartment were thermostated, the measurement chamber is not thermally insulated. Furthermore: (1) when positioned within the sample chamber, the cuvette partially protrudes from the compartment, exposing it to external air, (2) the position of the thermostat sensor is on the outer wall of the sample compartment and not immersed in the sample, and (3) finite time needed for transferring the sample from the chamber to measurement compartment creates a brief heat exchange with the environmental air, causing non-negligible spectral shifts.

Fig. 4. The influence of the light source is not negligible. The inset in upper left corner show that spectral changes focus around two wavelength regions: 1412 and 1492 nm; the spectrum of pure water at t ¼ 0 s coincides with the abscissa. Central panel magnifies the region around 1412 nm while the rightmost panel magnifies the region around 1492 nm. Lower left inset tracks the absorption intensity at 1412 nm over each consecutive spectrum for 3 samples.

Please cite this article in press as: D. Kojic, et al., Improving accuracy and reproducibility of vibrational spectra for diluted solutions, Analytica Chimica Acta (2017), http://dx.doi.org/10.1016/j.aca.2016.12.019

D. Kojic et al. / Analytica Chimica Acta xxx (2017) 1e12

6

Fig. 6. Variation in the mean-centered spectra of pure water and glycine is driven by consecutive spectra. Central panel: mean-centered spectra of pure water; two spectral patterns are observed and labeled with dotted and dashed lines; they describe a common spectral pattern containing two peaks (with opposing signs) centered at 1412 and 1492 nm. Left panel: distributions of absorptions at 1412 nm. Right panel: correlation between the absorption intensity at 1412 nm and the order number of consecutive spectrum.

3.3. The spectral variation for pure water We next inquired as to whether the light source was a constant influence across multiple experiments, concentration levels and subtle changes environmental conditions. Absorption intensity at 1412 nm was monitored in the non-subtracted (SNV corrected) spectra, under the influence of light (consecutive spectra). The largest variation comes from consecutive spectral acquisition (rightmost column, Fig. 5), followed by between sample variance

(leftmost column), with the lowest values being generated between replicates within the same experiment (central column). These trends are not caused by the SNV transform as they equally apply to the raw spectra (Fig. S1). Averaging over consecutive spectra (right column, Fig. 5) reveals that the influence of combined temperature/light source perturbations is substantial and sustained over significant environmental variations. The reasons are investigated in Fig. 6, where meancentered spectra were used to estimate the variation at 1412 nm

Fig. 7. Spectra of pure water with identical wavelength positions of the first overtone peak (1450 nm) exhibit significant residual variation at 1412 nm. Upper left panels displays the histogram of wavelength positions of the first overtone peak. All spectra are found at one of four shown positions. The number of spectra whose peaks are found at that position is the height of vertical bars. Lower left panel shows the 5, 50 and 95 percentiles of variation at 1412 nm in mean-centered spectra (Fig. 6, central panel) for each fixed wavelength position. Upper right displays the same data for samples taken from our library (see the Materials section). From the plot in the lower right panel we see that the range of values observed in panels on the left side (±11  103 a.u., dotted lines) applies to a large majority of water spectra at environmental conditions.

Please cite this article in press as: D. Kojic, et al., Improving accuracy and reproducibility of vibrational spectra for diluted solutions, Analytica Chimica Acta (2017), http://dx.doi.org/10.1016/j.aca.2016.12.019

D. Kojic et al. / Analytica Chimica Acta xxx (2017) 1e12

7

Fig. 8. Spectra of glycine solutions also change even when their peak position doesn't. Panels on the left half display the data for the concentration level of 15 mM while panels on the right display data for 7 mM.

(central panel). The graph titled, “Densities of Consecutives” shows that the first 3e4 consecutive spectra are responsible for the strong left tail in the distribution of absorption intensities (shown in the graph titled “Densities of Absorptions”). This asymmetric distribution presents difficulties because: (1) it originates from a relatively non-linear rate of change in absorption between consecutive

spectra, and (2) it deviates from symmetrical Gaussian distributions, which are assumed in the majority of parametric multivariate analysis and experimental design methods. Initially, it appears that a more symmetrical distribution could be achieved with the manipulation of the number or order of consecutive spectra. However, this may only correct the outcome while leaving the root

Fig. 9. Results of spectral subtraction for compared methods. Each curve represents one experiment. Black lines illustrate the conventional method. Gray lines illustrate the new method. Absorption scale for the new method is given on the right side of each panel (in dark gray). For clarity, spectra below 125 mM are shown separately. Vertical lines mark the wavelength positions of 1420 and 1550 nm. Panel I shows labels for four main bands that are positioned around 1369, 1393, 1423 and 1445 nm, respectively. The bands below 1200 nm (panels K, L) arise due to low signal intensity and are neglected.

Please cite this article in press as: D. Kojic, et al., Improving accuracy and reproducibility of vibrational spectra for diluted solutions, Analytica Chimica Acta (2017), http://dx.doi.org/10.1016/j.aca.2016.12.019

8

D. Kojic et al. / Analytica Chimica Acta xxx (2017) 1e12

The last analysis in this section refers to wavelength positions of the first overtone peak of water. Fig. 7 (left upper and lower panels) shows that, even under relatively stable temperature conditions, water spectra that share the same wavelength position in the main peak of the first overtone region, exhibit an exceptionally large variation at wavelength position 1412 nm (±1  102 a.u.). In order to confirm the important implications derived at this point, the sample of pure water spectra was extended to a library containing over 9000 spectra (see the Materials section). The assessment was limited to only statistically significant cases by selecting wavelengths that only contained more than 100 spectra. Findings confirm that the same range of variation applies to a much larger sample (Fig. 7, right upper and lower panels). In this analysis, the SNV corrected spectra were used. The same amount of variation was observed for the glycine solution spectra at the two lowest concentration levels (Fig. 8) as well as other higher concentration levels (Figs. S2eS4). Scaled/shifted spectra do not account for a substantial residual variation around 1412 nm (and consequently 1492 nm, see Fig. 4), which is why the authors believe that the scaling/shifting of spectra

Fig. 10. New method improves band visualization. Additional bands are well resolved in the difference spectra obtained by the new method.

cause unaddressed, and/or obscure the ways in which improvements might be achieved. Please note that all concentration levels of glycine solutions are displayed on the right panel. As seen from extensive overlaps between water and glycine spectra (Fig. 6, “Distribution of Consecutives”), in most cases the distance between bands of solvent and solutions can be minimized, since water spectra cover the entire range of variations in the glycine spectra.

Fig. 11. Distribution of variance across utilized wavelength range, for the SNV corrected spectra. Spectra are shown in gray and standard deviation in black. Gray vertical line shows the position of 1550 nm wavelength. Majority of the variation is located within the region below 1550 nm and the two maximums are at 1412 and 1492 nm. Negative values of absorption are only the consequence of the application of SNV transform.

Fig. 12. Improvements in precision using the new method are large. First, peak maximums for the band around 1420 nm for new method and for peak minimums for the band around 1400 nm for classical method were calculated. Precision of thus extracted values is evaluated as standard deviation, which is then expressed as percent of the median, using the following formula: 100  Standard_Deviation/Median. This data is presented in panel A for classical spectra (dashed line) and new spectra (full line). To more precisely illustrate the improvement we divided those values and in panel B presented the ratio of New/Classical precisions to show how many times is new method better than the old one e on average, the precision is 5 times better for new method. Panel C illustrates the results obtained using the experimental closest spectra. The horizontal line marks the value of 1, which means that everything above it represents a ratio larger than one, i.e. and improvement. Therefore, results at 1000 mM may look poorer but a visual inspection of Fig. S1 shows that both types of spectra have good consistency and that differences at this level are negligible. Improvements are found for all other concentration ranges and they point to 4 times better precision, on average. The extreme value at 62 mM is caused by unusually low variation for the new spectra.

Please cite this article in press as: D. Kojic, et al., Improving accuracy and reproducibility of vibrational spectra for diluted solutions, Analytica Chimica Acta (2017), http://dx.doi.org/10.1016/j.aca.2016.12.019

D. Kojic et al. / Analytica Chimica Acta xxx (2017) 1e12

9

3.4. The spectral variation for glycine solutions

Fig. 13. Absorption intensity at 1420 nm explains the difference in spectral patterns. Inset in the upper right corner shows the absorbance values at 1420 nm for all concentration levels. Each line represents one experiment. Numbers within lines label experiments in the order of their execution. Dashed line represents the absorption level of pure water at 1420 nm. Main figure is a magnification of the concentration region between 7 and 62 mM. Gray rectangle labels the region with ±1  103 a.u. deviation from the value of pure water (dashed line).

alone does not guarantee a satisfactory performance. The question that remains to be answered is whether the classical averaged spectra can still deliver accurate and reproducible differences, despite instrumental baseline shifts (Figs. 2 and 3), asymmetrical distribution of absorbance (Fig. 6), and fluctuations in the overlapping spectra (Figs. 7 and 8). Do we need a better alternative? The answer is given by juxtaposing the variations between classical and new subtracted spectra.

Spectra obtained using the standard and new method are compared in Figs. 9 and 10 (see Fig. S5 for the experimental closest spectrum). In both methods, nearly identical spectral patterns result in the region above 1550 nm; differences are located within the region below 1550 nm. Between 1100 and 1350 nm, several low intensity bands that are well resolved in the new spectra (for 1000 and 500 mM) are completely absent from the classical spectra (Fig. 10). These bands are also absent from the experimental closest spectra, which may serve as an argument to use the library in cases where relevant information is contained within the region. Standard deviations, calculated at each wavelength, are presented in Fig. 11 (see Fig. S6 for raw spectra). Two peaks located at 1412 and 1492 nm, demonstrate that the largest variation is concentrated in the region between 1300 and 1550 nm, which explains the largest variation in difference spectra in the region between 1300 and 1550 nm. Absorption intensity of water's OeH bond (the strongest absorber in this region [15]) is very sensitive to changes in hydrogen bonds between water molecules, as well as to various environmental perturbations. New method (gray spectra, Fig. 9) produced a pattern that appears consistently and varies only slightly with dilution. The classical method exhibits a much simpler pattern (black) whose shape is increasingly dispersed with dilution. Four major bands are visible from the new spectra (labeled in panel I of Fig. 1) while only one broad negative band is found in the classical spectra. For solutions below 125 mM, signal range for the new spectra consistently decreases towards the interval whose order of magnitude is roughly ±1  103 a.u. (panels I, J) or even around ±1  104 a.u. (panels K, L). This level of precision is illustrated in Fig. 12 and is unprecedented for the near-infrared spectra because, at the same

Fig. 14. Overlap between spectra of pure water and solutions at different concentrations. Panels AeF show averaged absorptions at 1412 nm for pure water (black) vs. glycine solutions (gray), while panels GeL show the same data for the wavelength position of 1420 nm. Dashed lines represent threshold values for bands observed in new spectra and are estimated from Fig. 1 with the following values (  103 a.u.): 15 (250 mM), 7 (125 mM), 3 (62 mM), 2 (31 mM), 1 (15 mM), 0.5 (7 mM). Although data for pure water and glycine may belong to same distributions their random values vary significantly which prevents stable expression of solute induced absorption bands.

Please cite this article in press as: D. Kojic, et al., Improving accuracy and reproducibility of vibrational spectra for diluted solutions, Analytica Chimica Acta (2017), http://dx.doi.org/10.1016/j.aca.2016.12.019

10

D. Kojic et al. / Analytica Chimica Acta xxx (2017) 1e12

Fig. 15. Difference in peak wavelength positions between pure water and glycine show different distributions for classical averaging (panel A), intermediate method of experimental averaging (panel B) and closest spectrum of pure water (panel C). Positive values of wavelength difference mean that the water spectrum is red shifted with respect to glycine spectrum. Negative values mean blue shift of water in comparison with glycine.

concentration levels, the signal range for classical spectra is 2e10 times higher, as estimated from values of standard deviations. Since unstable spectral patterns and signal ranges are observed for the classical spectra, we investigated the connection between the absorbance range and the shape of spectral pattern. On the basis of data shown in panels IeK of Fig. 9, we selected the value of ±1  103 a.u. as a threshold for difference spectra at which bands must be identified, and we monitored the absorbance at 1420 nm in non-subtracted (SNV corrected) data (Fig. 13) because this is the position of central positive band in the spectra that are subtracted using the new method. Data at concentration levels of 31 and 62 mM shows that spectra from experiments 1 and 6 have significantly lower values in comparison with the remainder of spectra. This corresponds to the situation in panels E and F of Fig. 9 where we observe two spectra with stronger negative values at wavelength of 1420 nm. These two curves have similar shapes that obviously differ from the shapes of other curves. The same data in Fig. 13, now for 15 mM, shows that four experiments are above the threshold (2, 3, 5 and 8), that one experiment is below (1) and that three experiments are within the threshold boundaries (4, 6, 7). The situation is identical to what is shown in panel G of Fig. 1. The situation at 7 mM (Fig. 13) shows that

three experiments fall within the threshold values but also that this is insufficient because threshold values decrease with dilution, and we also see that the spectra that are the closest to abscissa are not markedly different from the other spectra in this panel. Panels K and L clarify that the threshold needs to be lowered to ±1  104 a.u. but spectra with such low proximity are not found in the classical method at 7 mM. However, the spectrum that is closest to abscissa is magnified in the inset of panel H and it is visible that, although slightly corrupted by noise, a very weak band can be observed at 1420 nm and the emerging spectral shape begins to resemble the pattern observed in panel L. Nearly identical comparison applies to the spectra where experimental closest spectrum (Fig. S5) was used. Spectra that share their relative position with reference to threshold area, also share a common spectral pattern, which comes as a response to their smaller distance from the abscissa. Also, spectra within the threshold range exhibit a more informative spectral pattern with higher number of spectral features. This suggests that a flexible approach to spectral subtraction is required for reproducible and informative presentation of solute induced bands across a wide concentration range. Closer observation of the classical spectra at moderately concentrated solutions (above 125 mM, panels AeD, Fig. 9) shows that: (1) the band around 1420 nm is actually reproduced, but only as a weak “shoulder” of the wide band centered around 1400 nm, (2) the negative peak around 1400 nm is actually a shifted version of the band number 2 in panel I of Fig. 9. These observations show that new and classical spectra have similar features that are only clearly isolated in the new spectra. What causes these differences is not just the threshold values, but also the angle under which the spectral curve of pure water intersects the spectrum of solute between 1300 and 1550 nm. This angle is directly correlated with the wavelength position of the maximum absorption in the nonsubtracted (SNV corrected) spectra: red shifting lowers the angle while blue shifting increases it. To illustrate this connection we compared the wavelength positions of water peaks for three cases of subtractions: classical, new, and additional (intermediate) which included averaging water spectrum for each experiment separately. The reason for introducing the intermediate method is the possibility that the environmental variations are better compensated for with the solvent spectra taken under similar environmental conditions. Results summarized in Fig. 15 compare wavelength positions of pure water and glycine solutions' first overtone peaks around 1450 nm (SNV spectra). Differences in wavelength positions (difference ¼ pure water e glycine solution) are displayed as histograms (over all concentration levels). It is easy to observe the opposing results of classical (panel A) and new method (panel C) as well as intermediate results of the intermediate subtraction method (panel B). The closest spectra of pure water, obtained with the new method, exhibit a red shift from the spectrum of solute. The reason can be easily understood if we look back at Fig. 11: short wavelength branch of the first overtone peak is the main contributor to overall spectral variance; for difference between water and solution to be minimized, the slopes of low wavelength branches need to overlap as much as possible. This is achieved when the spectrum of pure water (which is usually higher in absorbance) is slightly red shifted with respect to the spectrum of solution. This red shift changes the angle of intersection between the spectra that are being subtracted and facilitates the exposure of the band around 1420 nm. This band is of great importance since we previously demonstrated that it is a central quantifier of the solute-solvent interaction but also of cooperative hydrogen bonding, which is an effect that transcends one-to-one interactions and models many-

Please cite this article in press as: D. Kojic, et al., Improving accuracy and reproducibility of vibrational spectra for diluted solutions, Analytica Chimica Acta (2017), http://dx.doi.org/10.1016/j.aca.2016.12.019

D. Kojic et al. / Analytica Chimica Acta xxx (2017) 1e12

11

Fig. 16. Principal Component Analysis. Loading plots are shown for SNV corrected spectra (panel A), spectra subtracted using the classical method (panel B), and spectra subtracted using the closest spectrum of pure water (panel C). Lower left corners identify the order of PC and the amount of variance explained by that PC. Variance is significantly redistributed in favor of higher PCs.

body interactions between solute and several surrounding water/ solute molecules [19]. Differences between the two approaches are further studied using the plots of Loadings that are derived from principal component analysis (PCA, Fig. 16). Panels A through C show a gradual change in the distribution of variance among PCs, with an increase in the amount of variance that originates from higher PCs. The reason is the removal of components that describe the averaged spectrum (PC1) and temperature related changes (PC2) [16e18], which usually insufficiently correlate with detailed solute specific features. This effect was already mentioned in the analysis of background spectra (Fig. 2). Here we observe how spectral subtraction simultaneously enhances and simplifies multivariate analysis. Analytical spectroscopists are often reluctant to build multivariate models that include higher numbers of principal components (PC) because of commonly recognized risks that may render the prediction performance unstable and include, but are not limited to: 1. Presence of minor number of (borderline) outlying samples that may define the higher PCs and therefore bias the model.

2. Distorted distribution of variance in which higher PCs explain only a minute fraction of the total variance. These PCs often yield valuable improvements in prediction performance but the amount of variance that they explain usually falls few orders of magnitude below the first and/or second PC, which often creates difficulties in choosing the optimal number of PCs, and may cause over-fitting. 3. Loadings plots for higher PCs point to bands that are often not visible in pre-processed spectra, possibly poorly correlate with the commonly observed bands, and might cause arbitrary assignments, over-fitting and misleading interpretations. This demonstrates that new spectral subtraction can significantly: (1) redistribute the variance over a larger number of PCs, and (2) reduce the number of PCs required for a suitable model performance while simultaneously improving precision, which provides the basis for increasingly confident spectral insights and interpretations. Most importantly, the similarity between patterns observed in PC4 (panel A), PC3 (panel B), and PC1 (panel C) serves as the main evidence that: (1) the new pattern truly characterizes the

Please cite this article in press as: D. Kojic, et al., Improving accuracy and reproducibility of vibrational spectra for diluted solutions, Analytica Chimica Acta (2017), http://dx.doi.org/10.1016/j.aca.2016.12.019

12

D. Kojic et al. / Analytica Chimica Acta xxx (2017) 1e12

variability present in the non-subtracted spectra (SNV corrected), and (2) that it successfully eliminates spectral components that are less specific to solute-solvent interactions. Additionally, PC4 (panel A) and PC3 (panel B), which accounted for only 0.01 and 0.69% of total variance, now account for 71% of variance. This also promotes other PCs (previously PC6-PC9), which are now supported by significantly larger fractions of total variance and/or possible large reduction in the number of PCs required to describe the crucial aspects of the model.

Author contributions All authors have given approval to the final version of the manuscript. Competing interests Authors declare no competing financial interests. Acknowledgments

4. Discussion The analysis of instrumental shifting revealed that the entire spectral region around 1412 nm exhibits baseline shifts in the order of 0.7÷1.4  103 a.u. (Fig. 2), which is comparable with band intensities in diluted solutions (±1  103 at 15 mM, and ±1  104 a.u. at 7 mM), suggesting that instrumental error alone is sufficient to obscure very weak bands and deteriorate the performance in diluted solutions. Spectral variance that originates from asymmetrical distributions of consecutive spectra (Figs. 6 and 8) is one to two orders of magnitude larger than the intensities of solute induced bands in diluted solutions (z±1  102 a.u., see Fig. 7). Agglomeration of the aforementioned factors produced poor performance for the averaged spectrum at moderate and low concentrations, which was contrasted by our proposal that provided, on average, a fourfold increase in precision. The main drawback of the new method is the acquisition of library which contains spectra taken under various mainly temperature perturbation, and whose application might be perceived as random. In order to address this issue Figs. 11, 14 and 15 are presented to offer a general guideline for acquiring customized database of much smaller size. Spectra of solutions provide the wavelength position from which the customized database needs to be red shifted (0e5 nm), preferably using thermal perturbations to completely cover this range of wavelengths. Alternative solution is to use the solvent spectra taken during the course of the study. This option also demonstrated significant improvements for diluted solutions, showing that subtraction of the closest spectrum of solvent provides a direct pathway to extracting the solute induced bands and increasing the reproducibility and precision that is applicable to any solvent and freed from any theoretical assumptions, mathematical models or calibrations. We expect that the combination of ideas of closest spectrum/ library construction with the existing scaling/shifting approaches will result in even higher improvements in precision because these two approaches complement each other.

5. Conclusion Evidence for large volatile spectral changes in low concentration regime is presented, as well as inability of fixed, averaged, spectrum of pure solvent to accommodate those changes and produce the precision required for adequate visualization of the extremely weak solute-induced bands encountered here. Application of chemometric methods can now be enhanced, since filtering out the sources of variation that are not primary to solute-solvent interaction will promote the information content from higher PCs and increase their scope of variance. Finally, the authors are not aware of reasons why this method cannot be implemented in other spectroscopic techniques (vibrational, electronic, scattering etc.) in which environmental variations (and bulk-solvent fluctuations) can produce measurable deviations.

Financial support from Suntory Global Innovation Center Ltd. program “Water Channeling Life” is gratefully acknowledged. The authors would like to thank the anonymous reviewers for their comments, because they significantly influenced the improvements in the quality of our work. Appendix A. Supplementary data Supplementary data related to this article can be found at http:// dx.doi.org/10.1016/j.aca.2016.12.019. References [1] Å. Rinnan, F. van der Berg, S.B. Englesen, Review of the most common preprocessing techniques for near-infrared spectra, Trac-Trend. Anal. Chem. 28 (10) (2009) 1201e1222. [2] A. Candolfi, R. De Maesschalck, D. Jouan-Rimbaud, P.A. Hailey, D.L. Massart, The influence of data pre-processing in the pattern recognition of excipients near-infrared spectra, J. Pharm. Biomed. 21 (1999) 115e132. [3] P. Lasch, Spectral pre-processing for biomedical vibrational spectroscopy and microspectroscopic imaging, Chemom. Intell. Lab. Syst. 117 (2012) 100e114. [4] J.R. Powell, F.M. Wasacz, R.J. Jakobsen, An algorithm for the reproducible spectral subtraction of water from the FT-IR spectra of proteins in dilute solutions and adsorbed monolayers, Appl. Spectrosc. 40 (3) (1986) 339e344. zolet, On the spectral subtraction of water from [5] F. Dousseau, M. Therrien, M. Pe the FT-IR spectra of aqueous solutions of proteins, Appl. Spectrosc. 43 (3) (1989) 538e542. [6] O. Kristiansson, J. Lindgren, J. de Villepin, A quantitative infrared spectroscopic method for the study of the hydration of ions in aqueous solutions, J. Phys. Chem. 92 (9) (1988) 2680e2685. [7] H.J. Kleeberg, Spectroscopic investigation of ternary solutions: H2OCH3CNCu(ClO4)2, Mol. Struct. 237 (1990) 187e206. [8] J.-J. Max, M. Trudel, C. Chapados, Subtraction of the water spectra from the infrared spectrum of saline solutions, Appl. Spectrosc. 52 (2) (1998) 234e239. [9] M. Smiechowski, J. Stangret, Vibrational spectroscopy of semiheavy water (HDO) as a probe of solute hydration, Pure Appl. Chem. 82 (10) (2010) 1869e1887. [10] K. Rahmelov, R. Hübner, Infrared spectroscopy in aqueous solution: difficulties and accuracy of water subtraction, Appl. Spectrosc. 51 (2) (1997) 160e170. [11] R.J. Barnes, M.S. Dhanoa, S.J. Lister, Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra, Appl. Spectrosc. 43 (1989) 772e779. [12] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2016. https://www.Rproject.org/. [13] T. Hirschfeld, Wavenumber scale shift in fourier transform infrared spectrometers due to vignetting, Appl. Spectrosc. 30 (5) (1976) 549e550. [14] T. Hirschfeld, Instrumental requirements for absorbance subtraction, Appl. Spectrosc. 30 (5) (1976) 550e551. [15] J. Workman Jr., L. Weyer, Practical Guide and Spectral Atlas for Interpretative Near Infrared Spectroscopy, second ed., CRC Press, Taylor & Francis, 2012, pp. 56e61. [16] W.C. McCabe, S. Subramanian, H.F. Fisher, A near-infrared spectroscopic investigation of the effect of temperature on the structure of water, J. Phys. Chem. 74 (25) (1970) 4360e4369. [17] H. Maeda, Y. Ozaki, M. Tanaka, N. Hayashi, T. Kojima, Near infrared spectroscopy and chemometrics studies of temperature-dependent spectral variations of water: relationship between spectral changes and hydrogen bonds, J. Near Infrared Spectrosc. 3 (1995) 191e201. ́  si [18] V.H. Segtnan, S. Sa c, T. Isaksson, Y. Ozaki, Studies on the structure of water using two-dimensional near-infrared correlation spectroscopy and principal component analysis, Anal. Chem. 73 (13) (2001) 3153e3161. [19] D. Koji c, R. Tsenkova, K. Tomobe, K. Yasuoka, M. Yasui, Water confined in the local field of ions, ChemPhysChem 15 (2014) 4077e4086.

Please cite this article in press as: D. Kojic, et al., Improving accuracy and reproducibility of vibrational spectra for diluted solutions, Analytica Chimica Acta (2017), http://dx.doi.org/10.1016/j.aca.2016.12.019