Linear programming applied to polarized Raman spectroscopy for elucidating molecular structure at surfaces

Linear programming applied to polarized Raman spectroscopy for elucidating molecular structure at surfaces

Chemometrics and Intelligent Laboratory Systems 196 (2020) 103898 Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory ...

1MB Sizes 0 Downloads 10 Views

Chemometrics and Intelligent Laboratory Systems 196 (2020) 103898

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemometrics

Linear programming applied to polarized Raman spectroscopy for elucidating molecular structure at surfaces Fei Chen a, Kuo-Kai Hung a, Dennis K. Hore a, b, *, Ulrike Stege a, ** a b

Department of Computer Science, University of Victoria, Victoria, British Columbia, V8P 5C2, Canada Department of Chemistry, University of Victoria, Victoria, British Columbia, V8W 3V6, Canada

A R T I C L E I N F O

A B S T R A C T

Keywords: Linear programming Molecular structure Vibrational spectroscopy

We present a framework for using linear programming to solve a challenging problem in surface science, the elucidation of the structure and composition of adsorbed molecules from a mixture, using simulated data from polarized Raman experiments. In the past, methods applied in order to interpret such spectroscopic information were combinatorial approaches that are limited in scalability or accuracy. Quantum mechanical electronic structure calculations yield the optical response of a single molecule, from which spectra of a mixture can be determined by appropriate weighting. Furthermore, spectral obtained in different beam polarizations provide projections of the signal in the laboratory frame. We demonstrate that linear programming is an ideal tool for utilizing all of this information in order to provide the sought structural picture.

1. Introduction Chemical processes such as oxidation, corrosion, adhesion, and lubrication are all governed by molecular details at surfaces. Surface structural characterization therefore remains an active area of research in the physics, chemistry, and biotechnology communities. One of the key challenges is to elucidate the surface structure, as molecules can vary significantly in terms of their composition, orientation, and conformation from what is found in the bulk materials [1]. Many experimental techniques have been applied in the study of molecular orientation at surfaces. Among them, optical methods are advantageous as they are rapid, non-destructive, and can be applied to any surface accessible to light, including buried interfaces. The subset of vibrational techniques is particularly attractive due to the possibility of sub-molecular level information from characteristic vibrational frequencies of chemical functional groups. These techniques generally include infrared absorption and Raman scattering. Although both methods have been used to characterize molecular orientation based on analysis of spectral data obtained using polarized light, Raman scattering has some inherent advantages over infrared absorption. Since it generally uses visible light, it can penetrate aqueous phases without attenuation. Also, as Raman scattering probes a higher-order response tensor than IR absorption, it is generally more sensitive to molecular orientation [2–5]. This field is especially

active for the characterization of molecular orientation in polymers [6–9], but has also been illustrated for biological molecules [10]. The Raman response of surface-adsorbed species is generally weak on account of the small probe volume in the case of monolayers, but this can be compensated by techniques such as surface-enhanced Raman scattering [11] and resonance Raman scattering [12–15]. Quantitative analysis of molecular orientation using Raman spectral data has also been demonstrated for graphene [16] and organic species on carbon electrode surfaces [17]. The primary challenge associated with characterizing adsorbed molecules on surfaces is that—due to the surface preference of a particular species—the surface concentration may be different from the bulk concentration. In addition, while the bulk exhibits an isotropic orientation distribution, most molecules land on the surface with a marked preference in their orientation. Due to the difficulty this introduces in producing experimental benchmark results, we are interested in a computational method whose performance we can evaluate without the existence of such data. Different optimization methods have been applied in order to infer molecular structure from spectra, including combinatorial approaches such as exhaustive search [18], truncated Newton method [19], Broyden-Fletcher-Goldfarb-Shanno (BFGS) [20,21], genetic algorithms, simulated annealing, and Monte Carlo-Metropolis [22]. While such

* Corresponding author. Department of Chemistry, University of Victoria, Victoria, British Columbia, V8W 3V6, Canada. ** Corresponding author. E-mail addresses: [email protected] (D.K. Hore), [email protected] (U. Stege). https://doi.org/10.1016/j.chemolab.2019.103898 Received 16 July 2018; Received in revised form 21 May 2019; Accepted 21 November 2019 Available online 25 November 2019 0169-7439/© 2019 Elsevier B.V. All rights reserved.

F. Chen et al.

Chemometrics and Intelligent Laboratory Systems 196 (2020) 103898

chemistry package [33] at the B3LYP/6-31G(d,p) level, as it is noted that this basis set has been demonstrated to reproduce vibrational spectra of amino acids [34]. A polarizable continuum model was used to provide an approximation of an aqueous solution. The frequencies were scaled by 0.96, typical for this basis set [35]. Although such scaling is not strictly necessary in this study, as target mixtures and individual components are calculated using the same approach, it enables more facile comparison with vibrational frequencies typically observed in experiments. We consider six amino acids with structures shown in Fig. 1: methionine (Met), leucine (Leu), isoleucine (Ile), alanine (Ala), threonine (Thr) and valine (Val). A Raman experiment with a j-polarized excitation source that collects the i-polarized component of the scattered radiation generates a signal that can be described according to

methods have been applied to elucidating structural information from spectroscopic data, they each have limitations on accuracy, and on the size and complexity of the problem they can address. BFGS is a hill-climbing method that is not guaranteed to globally converge [23]. Another popular method for such problems is the truncated Newton method that often performs well. However, as we will demonstrate in Section 6, the truncated Newton method can end in a local optimum that is far from the desired solution, which causes a problem for our application. Genetic algorithms, Simulated Annealing and Monte Carlo-Metropolis are all heuristic approaches where, again, an accurate result is not guaranteed. While exhaustive search methods can be accurate, they typically do not scale; this will be discussed further in Section 6. When combined with other methods to improve performance for larger problems they sacrifice some accuracy. In general, the scalability of these approaches is limited as the number of molecules, the number of surface-adsorbed states, and the desired level of structural resolution increases. Linear programming (LP) has the advantage of being a fast, scalable method that delivers accurate results [24–26]. An LP approach was previously successfully used to study the composition of a mixed ribonucleic acid (RNA) sample using ultraviolet–visible (UV) absorption spectroscopy [27]. White et al. also applied LP to UV absorption spectroscopy in order to study the concentration of the chemical entities in the mixture when each existing component’s spectrum is known [28]. In recent work, LP was used to eliminate the background and noise in Raman spectra by converting traditional polynomial curve fitting with constraints into an LP problem [29]. The standard form of LP is a minimization problem that has a linear objective function and a number of constraints [30]. LP is a deterministic process that is guaranteed to converge to a single global optimum if there is a bounded solution space [24,26,31]. Furthermore, current LP solvers can deal with hundreds of thousands of variables [32]. Here we provide the first demonstration that LP may be applied to the challenging problem of elucidating molecular structure at surfaces using structurally-sensitive polarized Raman spectroscopy. The method is quick and straightforward to implement. However, in contrast to other approaches, our suggested LP approach is scalable and accurate.

h i ð3Þ X Im αijij ðθÞ  Γq Iij ðθÞ ¼  2 Δω  ωq þ Γ2q q

(1)

where Iij represents ij-polarized Raman intensity, obtained by a sum over q vibrational modes with frequency ωq . Here Γq is the homogeneous line width, and mq is the reduced mass of each normal mode. Since we consider an azimuthally-isotropic distribution of molecules (ϕ and ψ angles uniformly distributed), only four of the nine elements ij are ð3Þ

unique: xx, xy, xz and zz. In Eq. (1), Δω is the Stokes Raman shift, and αijij

is an element of the third-order molecular response function, formally a 3  3  3  3 forth-rank tensor (81 elements). In the Raman scattering process, when the excitation laser source is far from any electronic resonance, the response may be approximated in terms of a 3  3 secondrank tensor aR 1 ∂αij 1 ∂αij  pffiffiffiffiffiffiffiffiffiffiffi : αð3Þ ijij  aR;ij ¼ pffiffiffiffiffiffiffiffiffiffiffi 2mq ω ∂Q 2mq ω ∂Q

(2)

The correspondence between these quantities of rank-2 and rank-4 is seen in the outer product in Eq. (2). In the above expression, we also make use of the harmonic approximation where the linear polarizability αij is evaluated for various geometries Q by means of the electronic structure calculations to provide ð∂αij =∂Qq Þ through finite difference methods [18]. This results in

2. Generation of the Raman spectra 2.1. Single molecule

aR;ij ¼

All calculations were performed using the GAMESS quantum

  1 ∂αij 2 2mq ωq ∂Qq θ

(3)

We now consider the orientation of these molecules when adsorbed onto the surface, with the plane of the surface being the xy plane, and z as the surface normal vector. We consider i; j; k to be any of the laboratory frame Cartesian x; y; z-coordinates, and l; m; n to be any of the moleculefixed a; b; c coordinates. As shown in Fig. 1, the tilt angle θ is defined as the angle between c and z. In order to generate the Raman spectra for a molecule tilted at an arbitrary angle, ð∂αij =∂QÞθ is required, which can be obtained from the pre-rotated values ð∂αij =∂QÞθ¼0∘ via the coordinate transformation 

∂αij ∂ Qq

2 ¼ θ

1 4π 2

Z

2π 0

Z

2π 0

a;b;c X a;b;c X l

m

 Dil Djm

∂αlm ∂Q

 θ¼0∘

!2 dφ d ψ

(4)

where Dil are elements of the direction cosine operator [5,36].All test cases in our study are limited to considering only the tilt angle, and assume isotropy in the azimuthal and twist angular distributions. The subsequent integration over φ and ψ creates the θ-dependent properties of interest. In other words, the molecules have no preference as to how they are oriented in the xy plane of the surface. The further simplification is made where the orientation distribution is infinitely narrow, that is, all molecules of a given type are oriented with the same angle θ. In the case where a distribution of molecular orientations is desired in the model, for

Fig. 1. Structure of the six amino acids considered in this study, in their zwitterionic form. In the case of Ala, the relationship between the molecule-fixed and laboratory-frame coordinate system is illustrated, with the tilt angle θ defined as the angle between c and z. 2

F. Chen et al.

Chemometrics and Intelligent Laboratory Systems 196 (2020) 103898

Fig. 2. Top row: Raman spectra in (from left to right) xx, xy, xz and zz polarization schemes, taken for a representative snapshot consisting of 3.2% Val with θ ¼ 70∘ , 74% Thr with θ ¼ 20∘ , 20% Ala with θ ¼ 40∘ , 0.17% Ile with θ ¼ 0∘ , 1.8% Leu with θ ¼ 70∘ , and 1.1% Met with θ ¼ 60∘ as indicated in Table 1. In order to visualize the candidate spectra more clearly, they are plotted without their relative weighting, i.e. each at 100%. Bottom row: the weighted average of the six molecules in the mixture.

example a Gaussian distribution of tilt angles about a mean tilt angle, or the inclusion of twist angles for planar structures, Eq. (4) may be modified accordingly. Raman spectra for the six candidate molecules in each of the four experimental polarizations are shown in the top row of Fig. 2, with each oriented at a specific tilt angle as indicated.

Sij ¼ min

ℓ¼1

k¼1

(6)

θ

Here, n denotes the number of candidates, p denotes the total number of points in the spectral data, xðℓ; θÞ are the unknown fractions of each candidate that we seek to discover, and the sum of all xðℓ; θÞ is unity. Tij;k refers to the kth selected data point (frequency of the Raman Stokes shift) of the ij-polarized target spectrum. Iij;k ðθÞ denotes the spectral intensity of each candidate at a selected data point. For each data point, the absolute residual between the target spectrum and the one composed by the decision variables is calculated. The objective function minimizes the sum of the absolute residuals over all the data points. The overall score is then given by the sum over all participating ij, in our case the four sets of beam polarizations

2.2. Collection of molecules We define the term candidate as a specific molecule with a specific θ value. To simulate the Raman response of a collection of molecules, we perform a weighted average of the individual spectra. Since this is the result that we seek to determine using an LP model, from the data obtained for the individual candidates, we refer to the superposition as the target. The Raman spectrum of the target is given by n X X Tij ¼ f ðℓ; θÞ  Iij ðℓ; θÞ

p  n X  X X   xðℓ; θÞ  Iij;k ðℓ; θÞ Tij;k 

S ¼ Sxx þ Sxy þ Sxz þ Szz (5)

(7)

with

as depicted in Fig. 2. To transform the above into LP format, we next eliminate the absolute signs by introducing one more variable Ak for each data point.

n X X f ðℓ; θÞ ¼ 1:

n X   X   Ak ¼ Tij;k  xðℓ; θÞ  Iij;k ðℓ; θÞ

ℓ¼1

ℓ¼1

θ

ℓ¼1

θ

In the example shown in Fig. 2, the sum of each column (considering each polarization scheme separately) is weighted according to the fraction f ðℓ; θÞ to produce the target spectrum that appears in the bottom row. The goal of our LP model is to reproduce the correct weighting of the candidates f ðℓ; θÞ and the tilt angle θ of each candidate from the four target spectra representing a single combination of the mixed surface species.

(8)

θ

which is equivalent to Ak  Tij;k 

XX xðℓ; θÞ  Iij;k ðℓ; θÞ

(9a)

ℓ¼1 θ

and Ak   Tij;k þ

XX xðℓ; θÞ  Iij;k ðℓ; θÞ:

(9b)

ℓ¼1 θ

3. Deriving the LP model

Then, the objective function in Eq. (6) can be rewritten as X Sij ¼ min Ak

The problem we are faced with can be described as follows. Given a target spectrum that is composed of a group of candidates with known spectral information, determine the true composition of the candidates that can accurately reconstruct the target. This can be expressed as a minimization problem as the score for an individual polarization ij of the Raman data

k¼1

with the following constraints:

3

(10)

F. Chen et al.

Chemometrics and Intelligent Laboratory Systems 196 (2020) 103898

A1  Tij;1 þ A1 þ Tij;1 

Ap  Tij;p þ

n X X xðℓ; θÞ  Iij;1 ðℓ; θÞ  0

ℓ¼1

θ

ℓ¼1

θ

ℓ¼1

θ

randomly picked one candidate from each amino acid’s ten candidates. Then we randomly assigned a percentage to the selected candidates, and the target is generated from Eq. (5). The remaining 54 candidates have no contribution to the target. An example of such a target is shown in Table 1. We also generate the Raman spectra for all was 60 candidates for input to our LP problem. The LP input data is then prepared from the above spectral information for input to Eq. (10). We use the simplex method (see, for example, Ref. [37]) in the GNU linear programming toolkit (GLPK) [38]. Although it has been shown that the simplex method is not a polynomial-time algorithm for general inputs, in practice it exhibits polynomial-time behaviour as it usually takes 2ne3n steps to solve a problem, where n is the number of the decision variables. In our experiments, the practical time complexity exhibited linear behaviour.

n X X xðℓ; θÞ  Iij;1 ðℓ; θÞ  0

⋮ n X X xðℓ; θÞ  Iij;p ðℓ; θÞ  0

(11)

n X X xðℓ; θÞ  Iij;p ðℓ; θÞ  0 Ap þ Tij;p  ℓ¼1 θ X X xðℓ; θÞ ¼ 1:

ℓ¼1 θ

The above minimization problem consists of an objective function and a number of constraints with decision variables to be determined, here fractions xðℓ; θÞ [25]. While our problem is now describe as an LP problem, it is not in so-called standard form. Since, to solve our specific instances, we used an implementation of the simplex method that takes LP standard form as input only, above problem is transformed into its standard form, where constraint inequalities in Eq. (11) become equalities by introducing new variables, called slack variables (SV) A1  Tij;1 þ

n X X xðℓ; θÞ  Iij;1 ðℓ; θÞ þ s1 ¼ 0

A1 þ Tij;1 

n X X xðℓ; θÞ  Iij;1 ðℓ; θÞ þ s2 ¼ 0

Ap  Tij;p þ

ℓ¼1

θ

ℓ¼1

θ

n X X

Ap þ Tij;p 

The return composition of each test case in the set is obtained for each trial and compared with the target to calculate the sum of the residuals. If the sum is smaller than a certain threshold (set to 107 ), then the return composition is considered to be the same as the target one. We run this test 100 times with randomly selected target compositions in order to test the accuracy of our LP model, and the correct target composition was returned each time. This ensures that we are evaluating the technique in general, without the influence of a few randomly selected easy or difficult targets. An example of such a result is shown in the first two lines of Table 2. Here the target is generated from a subset of the candidates, so both are necessarily on the same scale. However, target spectra obtained from real experimental data in the lab are always measured in arbitrary intensity units. Furthermore, the candidates are most often calculated in the same way as done here, and so the scaling of the candidates is based on the quantum chemical calculations, also in arbitrary units. As a result, we must consider that the true target may be related to the calculated target (Eq. (13)) by a scaling factor fs , where fs ¼ 1  y. This scaling factor will always be unknown, as no experimental calibration is likely. A scheme without a scaling factor (i.e. scaling factor fs ¼ 1) would require experimental measurement of the individual candidates. While this has been demonstrated in solution for isotropic distributions of molecules [27], it is not practical for characterizing molecules on surfaces that are oriented with specific and narrow distributions of tilt angles.



xðℓ; θÞ  Iij;p ðℓ; θÞ þ s2p1 ¼ 0

ℓ¼1

5. Evaluation

(12)

θ

n X X xðℓ; θÞ  Iij;p ðℓ; θÞ þ s2p ¼ 0 ℓ¼1 θ XX xðℓ; θÞ ¼ 1

ℓ¼1 θ

s1 ; s2 ; …; s2p1 ; s2p  0: It has been shown that for any LP problem, there are only three kinds of possible solutions: feasible and bounded, feasible and unbounded, and infeasible ones. If the solution space is feasible and bounded, then there exists exactly one optimum solution. If it is feasible but unbounded, then there is a solution space with an infinite number of optimal solutions, while in the infeasible case no solution exists [26]. Assuming that we can obtain sufficiently precise data, the solution space of our LP instances is feasible and bounded. Therefore, a unique optimum solution exists, that is we can expect a unique optimal solution to each of our instances. In other words, solving the LP problem will yield exactly the expected target composition.

Table 2 Representative LP result in cases where the target and candidates are of the same scale (unscaled result), and where the target and candidates are scaled independently (scaled result).

4. Implementation In the first step, the molecular property information ∂α= ∂Q is generated from the electronic structure calculations. For a specific candidate and value of its tilt angle θ, the spectral information is obtained from Eq. (1). Each amino acid has ten candidates in the mixture, as the tilt angle was considered from 0∘ to 90∘ in steps of 10∘ . Therefore the number of possible combinations of candidates is 106 . For each test case, we

Molecule

Val

Thr

Ala

Ile

Leu

Met

Tilt angle target unscaled LP result (fs ¼ 1) scaled LP result (fs ¼ 0:6)

70∘ 0.032 0.032 0.019

10∘ 0.74 0.74 0.44

30∘ 0.20 0.20 0.12

0∘ 0.0017 0.0017 0.0010

70∘ 0.018 0.018 0.011

60∘ 0.0110 0.0110 0.0067

Table 1 Example of a target composition of the structures shown in Fig. 1 corresponding to the spectra shown in Fig. 2.

Val Thr Ala Ile Leu Met

0∘

10∘

20∘

30∘

40∘

50∘

60∘

70∘

80∘

90∘

0 0 0 0.0017 0 0

0 0 0 0 0 0

0 0.74 0 0 0 0

0 0 0 0 0 0

0 0 0.2 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0.011

0.032 0 0 0 0.018 0

0 0 0 0 0 0

0 0 0 0 0 0

4

F. Chen et al.

Chemometrics and Intelligent Laboratory Systems 196 (2020) 103898

Table 3 The Truncated Newton method solution with uniform initial distribution. In the case of deviation from the exact solution (difference indicate in parenthesis). This solution is plotted with red lines in Fig. 3. 0∘

10∘

20∘

30∘

40∘

50∘

60∘

70∘

80∘

90∘

Val

0.0000

0.0000

0.0000

0.1751 ð0:1751Þ 0.0000

0.0085 ð0:0085Þ 0.0000

0.0081 ð  0:0239Þ 0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0228 ð0:0228Þ 0.0000

0.0000 0.0000

0.0000

0.0000

Leu

0.0000 ð  0:0017Þ 0.0000

0.0000

0.1303 ð  0:6097Þ 0.0056 ð0:0056Þ 0.0030 ð0:0030Þ 0.0000

0.0052 ð0:0052Þ 0.0338 ð0:0338Þ 0.0646 ð0:0646Þ 0.0000

0.0000

0.1925 ð0:1925Þ 0.0000

0.0040 ð0:0040Þ 0.0961 ð0:0961Þ 0.0698 ð  0:1302Þ 0.0000

0.0000

Thr

0.0058 ð0:0058Þ 0.1170 ð0:1170Þ 0.0361 ð0:0361Þ 0.0000 0.0000

0.0010

0.0000

0.0090 ð0:0090Þ

0.0000

0.0051 ð  0:0129Þ 0.0000

0.0000

0.0000

0.0055 ð0:0055Þ 0.0000 ð  0:0110Þ

0.0000

Met

0.0046 ð0:0046Þ 0.0000

0.0000

0.0000

Ala Ile

A1  ð1  yÞTij;1 þ

0.0018 ð0:0018Þ

n X X xðℓ; θÞ  Iij;1 ðℓ; θÞ þ s1 ¼ 0

ℓ¼1

θ

ℓ¼1

θ

optimization of a large number of parameters [39–41]. One of the trade-offs that comes with the speed of these algorithms is their reliance on an initial guess of the parameters. For the class of problem we are describing, this is usually not possible based on chemical intuition. We have implemented a truncated Newton method search using two

n X X xðℓ; θÞ  Iij;1 ðℓ; θÞ þ s2 ¼ 0 A1 þ ð1  yÞTij;1 

⋮ n X X xðℓ; θÞ  Iij;p ðℓ; θÞ þ s2p1 ¼ 0 Ap  ð1  yÞTij;p þ ℓ¼1

(13)

θ

n X X xðℓ; θÞ  Iij;p ðℓ; θÞ þ s2p ¼ 0 Ap þ ð1  yÞTij;p  ℓ¼1 θ XX xðℓ; θÞ ¼ 1  y

ℓ¼1 θ

s1 ; s2 ; …; s2p1 ; s2p  0: Repeating the same test cases, but this time with arbitrary scaling factors, we observe that the LP always outputs y > 0, and that the return composition indeed delivers the correct candidates. While, in this case, the fractions of the selected candidates are different from the ones in the target composition, they differ exactly by the same factor, fs . This is illustrated in the bottom row of Table 2 with fs ¼ 0:6, and hence y ¼ 0:4, noting that 0:019 0:44 0:12 0:001 0:011 0:0067 ¼ ¼ ¼ ¼ ¼ ¼ 0:6: 0:032 0:74 0:2 0:0017 0:018 0:011

(14)

When determining the fractions with a scaling factor as in Eq. (13), the value for (slack) variable y and therefore also scaling factor fs are obtained successfully in all test cases, and enabled reconstruction of the target composition. 6. Comparison with other methods As mentioned earlier, characterizing the structure of molecules on surfaces is a particularly challenging endeavor as the surface composition itself may vary from the adjacent bulk solution. For example, even a twocomponent mixture with 50% A and 50% B in solution may exhibit a preferential absorption so the surface consists of a monolayer that is 70% A and 30% B. Even if this could be predicted, we are immediately faced with the second part of this challenge, that molecules are rarely adsorbed in an isotropic manner; they are ordered with respect to the surface normal as a consequence of the anisotropic environment at the surface. We have described this aspect in terms of the tilt angle θ. For this reason, simulated spectra are especially valuable as they offer an opportunity to benchmark structure elucidation algorithms using known solutions. At the same time, this creates a challenge in the minimization of the objective function as we need to deal with a large number of unknowns. Even in the case of our relatively simple example with 6 amino acids, limiting the tilt angles to 0–90∘ in steps of 10∘ already results in 60 coefficients that need to be determined. The family of truncated Newton methods are commonly employed in such cases as they are especially well-suited to problems requiring the

Fig. 3. Comparison of known solution involving the composition and tilt angle of the surface molecules (black), result returned by truncated Newton method with uniform starting distribution (red) and random starting distribution (blue). Spectra have been offset vertically for clarity. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) 5

F. Chen et al.

Chemometrics and Intelligent Laboratory Systems 196 (2020) 103898

Table 4 The Truncated Newton method solution with uniform initial distribution. In the case of deviation from the exact solution (difference indicate in parenthesis). This solution is plotted with blue lines in Fig. 3.

Val Thr Ala Ile

0∘

10∘

20∘

30∘

40∘

50∘

60∘

70∘

80∘

90∘

0.0058 ð0:0058Þ 0.4941 ð0:4941Þ 0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0026 ð0:0026Þ 0.0000

0.0000

0.0000

0.0000

0.0205 ð0:0205Þ 0.0000

0.0000

0.0000 ð  0:7400Þ 0.0000

0.0000 ð  0:0320Þ 0.0000

0.0000

0.0000

0.2009

0.0049 ð0:0049Þ 0.2493 ð0:2493Þ 0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

Leu

0.0036 ð0:0019Þ 0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0122 ð0:0122Þ

0.0000

0.0000 ð  0:0180Þ 0.0000

0.0000

Met

0.0159 ð0:0159Þ 0.0000

0.0000

0.0000

different starting conditions: a uniform distribution of molecules across all species, and a randomly generated concentration profile. In both cases, the distribution has been normalized to describe 100% of the population. In the case of the initial uniform distribution, the difference between the exact (known target) composition and the returned solution is shown in Table 3, and the resulting spectra of the mixture are shown with red lines in Fig. 3. The search starting with a random distribution of species produces differences from the correction solution shown in Table 4 and the blue spectra in Fig. 3. As can be observed, very different solutions can be obtained—and these can be far from the known solution in terms of the distribution of molecular species—yet all with arbitrarily small values of the objective function. We therefore demonstrate that an exact method is required. As an alternative to LP, an exhaustive search is a brute force method that scans the parameter space by testing every possible distribution of the molecules in question. We consider the number of molecules n, the number of tilt angles considered for each molecule m, and the desired composition resolution r. The number of combinations to be tested is determined by

0.0000 ð  0:0110Þ

[2] C. Sourisseau, Polarization measurements in macro- and micro-Raman spectoscopies: molecular orientations in thin films and azo-dye polymer systems, Chem. Rev. 104 (2004) 3851–3892. [3] M. Tanaka, R.J. Young, Polarised Raman spectroscopy for the study of molecular orientation distributions in polymers, J. Mater. Sci. 41 (2006) 963–991. [4] F. Lagugne Labarthet, Polarized measurements in Raman microscopy, Annu. Rep. Prog. Chem. 103 (2007) 326–350. [5] K.-K. Hung, U. Stege, D.K. Hore, IR absorption, Raman scattering, and IR-vis sumfrequency generation spectroscopy as quantitative probes of surface structure, Appl. Spectrosc. Rev. 50 (2015) 351–376. [6] S. Yang, S. Michielsen, Orientation distribution functions obtained via polarized Raman spectroscopy of poly(ethylene terephthalate) fibers, Macromolecules 36 (2003) 6484–6492. [7] F. Lagugne Labarthet, C. Sourisseau, Raman study of the photoisomerization and angular reorientation of azobenzene molecules in a DR1-doped polymer matrix, J. Raman Spectrosc. 27 (1996) 491–498. [8] M. Richard-Lacroix, C. Pellerin, Novel method for quantifying molecular orientation by polarized Raman spectroscopy: a comparative simulations study, Appl. Spectrosc. 67 (2013) 409–419. [9] M. Richard-Lacroix, C. Pellerin, Accurate new method for molecular orientation quantification using polarized Raman spectroscopy, Macromolecules 46 (2013) 5561–5569. [10] M. Tsuboi, J.M. Benevides, G.J. Tomas Jr., Raman tensors and their application in structural studies of biological systems, Proc. Jpn. Acad. Ser. B 85 (2009) 83–97. [11] D.V. Chulhai, L. Jensen, Determining molecular orientation with surface-enhanced Raman scattering using inhomogenous electric fields, J. Phys. Chem. C 117 (2013) 19622–19631. [12] T. Nakanaga, T. Takenaka, Resonance Raman spectra of monolayers of a surface–active dye adsorbed at the oil–water interface, J. Phys. Chem. 81 (1977) 645–649. [13] T. Takenaka, H. Fukuzaki, Resonance Raman spectra of insoluble monolayers spread on a water surface, J. Raman Spectrosc. 8 (1979) 151–154. [14] T. Takenaka, T. Nakanaga, Resonance Raman spectra of monolayers adsorbed at the interface between carbon tetrachloride and an aqueous solution of a surfactant and a dye, J. Phys. Chem. 80 (1976) 475–480. [15] T. Takenaka, Effect of electrolyte on the molecular orientation in monolayers adsorbed at the liquid–liquid interface: studies by resonance Raman spectra, Chem. Phys. Lett. 55 (1978) 515–518. [16] Z. Li, R.J. Young, I.A. Kinloch, N.R. Wilson, A.J. Marsden, A.P.A. Raju, Quantitative determination of the spatial orientation of graphene by polarized Raman spectroscopy, Carbon 88 (2015) 215–224. [17] Y.-C. Liu, R.L. McCreery, Raman spectroscopic determination of the structure and orientation of organic monolayers chemisorbed on carbon electrode surfaces, Anal. Chem. 69 (1997) 2091–2097. [18] S.A. Hall, A.D. Hickey, D.K. Hore, Structure of phenylalanine adsorbed on polystyrene from nonlinear vibrational spectroscopy measurements and electronic structure calculations, J. Phys. Chem. C 114 (2010) 9748–9757. [19] R. Roy, E. Sevick-Muraca, Truncated Newton’s optimization scheme for absorption and fluorescence optical tomography: Part I theory and formulation, Opt. Express 4 (1999) 353–371. [20] P. Partovi-Azar, T.D. Kühne, P. Kaghazchi, Evidence for the existence of Li2S2 clusters in lithium-sulfur batteries: ab initio Raman spectroscopy simulation, Phys. Chem. Chem. Phys. 17 (2015) 22009–22014. [21] S.A. Hall, K.C. Jena, T.G. Trudeau, D.K. Hore, Structure of leucine adsorbed on polystyrene from nonlinear vibrational spectroscopy measurements, molecular dynamics simulations, and electronic structure calculations, J. Phys. Chem. C 115 (2011) 11216–11225. [22] G. Maurin, P. Senet, S. Devautour, P. Gaveau, F. Henn, V.E. Van Doren, J.C. Giuntini, Combining the Monte Carlo technique with 29Si NMR spectroscopy: simulations of cation locations in zeolites with various Si/Al ratios, J. Phys. Chem. B 105 (2001) 9157–9161. [23] D.-H. Li, M. Fukushima, A modified BFGS method and its global convergence in nonconvex minimization, J. Comput. Appl. Math. 129 (2001) 15–35. [24] G.B. Dantzig, Reminiscences about the origins of linear programming, Oper. Res. Lett. 1 (1982) 43–48.

cðn; mÞpðrÞ , where cðn; mÞ determines the table size, and pðrÞ controls the number of tables to be evaluated. Thus exhaustive search results in an exponential running time. In our case with n ¼ 6 and a tilt angle resolution of 10∘ in the range θ ¼ 0–90∘ (m ¼ 10), even when avoiding repetition and limiting the composition resolution to 10%, this already results in a lower bound of 650 million combinations that need to be evaluated. This demonstrates that, even with the possibility of parallel computing on large clusters, such a brute force method does not compare with the efficiency of linear programming, evaluated in seconds on a workstation. Furthermore, LP has no inherent resolution restriction. 7. Conclusion We have demonstrated a framework for using linear programming to solve a challenging problem in surface science, the elucidation of the structure and composition of adsorbed molecules from a mixture. Polarized Raman spectroscopy provides a sensitive fingerprint of the composition and molecular orientation, but harnessing this rich molecular information requires modeling the spectral response and a method that can fit data when there are many possible candidates in the solution space. Due to its accuracy and scalability, linear programming is an ideal method to tackling this problem. Acknowledgement We thank the Natural Sciences and Engineering Research Council of Canada for support of this science with Discovery Grants to DKH and US. References [1] X. Zhuang, P.B. Miranda, D. Kim, Y.R. Shen, Mapping molecular orientation and conformation at interfaces by surface nonlinear optics, Phys. Rev. B 59 (1999) 12632–12640. 6

F. Chen et al.

Chemometrics and Intelligent Laboratory Systems 196 (2020) 103898

[25] N. Karmarkar, A new polynomial-time algorithm for linear programming, Combinatorica 4 (1984) 373–395. [26] V. Chvatal, Linear Programming, W. H. Freeman and Company, New York, 1983. [27] A.W. Pratt, J.N. Toal, G.W. Rushizky, Computer assisted analysis of oligonucleotides, Ann. N. Y. Acad. Sci. 128 (1966) 900–913. [28] W.C. Whiten, M.B. Shapiro, A.W. Pratt, Linear programming applied to ultraviolet absorption spectroscopy, Commun. ACM 6 (1963) 66–67. [29] S.-J. Baek, A. Park, A. Shen, J. Hu, A background elimination method based on linear programming for Raman data, J. Raman Spectrosc. 42 (2011) 1987–1993. [30] J. Matousek, B. G€ artner, Understanding and Using Linear Programming, SpringerVerlag, Heidelberg, 2007. [31] L.G. Khachiyan, Polynomial algorithms in linear programming, USSR Comput. Math. Math. Phys. 20 (1980) 53–72. [32] J.L. Gearhart, K.L. Adair, J.D. Durfee, K.A. Jones, N. Martin, R.J. Detry, Comparison of open-source linear programming solvers. Tech. Rep., Sandia National Laboratories, Albuquerque, New Mexico, October 2013. [33] M.W. Schmidt, K.K. Baldridge, J.A. Boatz, S.T. Elbert, M.S. Gordon, J.H. Jensen, S. Koseki, N. Matsunaga, K.A. Nguyen, S. Su, T.L. Windus, M. Dupuis,

[34] [35] [36] [37] [38] [39] [40] [41]

7

J.A. Montgomery Jr., General atomic and molecular electronic structure system, J. Comput. Chem. 14 (1993) 1347–1363. R. Linder, M. Nispeal, T. H€aber, K. Kleinermanns, Gas-phase FT-IR spectra of natural amino acids, Chem. Phys. Lett. 409 (2005) 260–264. J.R. Merrick, D. Moran, L. Radom, An evaluation of harmonic vibrational frequency scale factors, J. Phys. Chem. A 111 (2007) 11683–11700. S. Roy, K.-K. Hung, U. Stege, D.K. Hore, Rotations, projections, direction cosines, and vibrational spectra, Appl. Spectrosc. Rev. 49 (2013) 233–248. T.H. Cormen, C.E. Leiserson, R.L. Livest, C. Stein, Introduction to Algorithms, MIT Press and McGraw-Hill, 2001, pp. 790–804. Ch. The Simplex Algorithm. GNU linear programming kit, version 4.64. http://www.gnu.org/software/glpk/ glpk.html, 2017. J. Nocedal, S.J. Wright, Numerical Optimization, Springer, New York, 1999. S.G. Nash, Newton-type minimization via the lanczos method, SIAM J. Numer. Anal. 21 (1984) 770–778. S.G. Nash, A survey of truncated-Newton methods, J. Comput. Appl. Math. 124 (2000) 45–49.