Vibrational two-dimensional correlation spectroscopy (2DCOS) study of proteins

Vibrational two-dimensional correlation spectroscopy (2DCOS) study of proteins

Accepted Manuscript Vibrational two-dimensional correlation spectroscopy (2DCOS) study of proteins Isao Noda PII: DOI: Reference: S1386-1425(17)3051...

2MB Sizes 0 Downloads 38 Views

Accepted Manuscript Vibrational two-dimensional correlation spectroscopy (2DCOS) study of proteins

Isao Noda PII: DOI: Reference:

S1386-1425(17)30512-7 doi: 10.1016/j.saa.2017.06.034 SAA 15250

To appear in:

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy

Received date: Revised date: Accepted date:

19 March 2017 22 June 2017 26 June 2017

Please cite this article as: Isao Noda , Vibrational two-dimensional correlation spectroscopy (2DCOS) study of proteins, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy (2017), doi: 10.1016/j.saa.2017.06.034

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT 1

Vibrational Two-Dimensional Correlation Spectroscopy (2DCOS)

AN

US

Isao Noda

CR

IP

T

Study of Proteins

University of Delaware

ED

M

Newark, DE 19716, U.S.A.

AC

CE

PT

E-mail: [email protected]

ACCEPTED MANUSCRIPT 2

Abstract A tutorial is provided for the generalized two-dimensional correlation spectroscopy (2DCOS), which is applicable to the vibrational spectroscopic study of proteins and related systems. In 2DCOS, similarity or dissimilarity among variations of spectroscopic intensities, which are induced by applying an external perturbation to the sample, is examined by constructing

T

correlation spectra defined by two independent spectral variable axes. By spreading congested or

IP

overlapped peaks along the second dimension, apparent spectral resolution is enhanced and

CR

interpretation of complex spectra becomes simplified. A set of simple rules for the intensities and signs of correlation peaks is used to extract insightful information. Simulated IR spectra for a model protein are used to demonstrate the specific utility of 2DCOS. Additional tools useful in

US

the 2DCOS analysis of proteins, such as data segmentation assisted with moving-window analysis,

two-dimensional correlation spectroscopy, spectral

M

Keywords:

AN

2D codistribution analysis, Pareto scaling, and null-space projection are also discussed.

AC

CE

PT

ED

spectroscopy, protein

analysis,

vibrational

ACCEPTED MANUSCRIPT 3

1. Introduction Generalized two-dimensional correlation spectroscopy (2DCOS) is a versatile data analysis technique useful in a broad range of spectroscopic applications [1-4]. In 2DCOS, a spectral correlation intensity is obtained as a function of two independent spectral variables, like IR wavenumbers, for a sample under the influence of some externally applied perturbation, such as change in temperature or concentration. The applied perturbation induces systematic variations

T

in the observed spectrum of the sample, which can be examined using a form of complex cross

By spreading the congested or

CR

systematic variations in spectroscopic signal intensities.

IP

correlation analysis. 2DCOS analysis reveals the underlying similarity or dissimilarity among

overlapped peaks along the second dimension, apparent spectral resolution is enhanced and

US

interpretation of complex spectra becomes simplified. A set of rules can be applied to the intensities and signs of peaks appearing in 2D correlation spectra to rapidly extract insightful

AN

information. A series of comprehensive reviews spanning several decades covering numerous 2DCOS application examples are available [5-12].

M

It has been well known that 2DCOS technique is especially well-suited for the vibrational spectroscopy analysis of complex molecules, like proteins, peptides and related molecules.

ED

Bogusล‚awa Czarnik-Matusewicz had recognized from her early days of research career the promising potential of vibrational 2DCOS analysis using IR, Raman and NIR probes in the study

PT

of proteins. Her work in the field includes 2D NIR study of milk protein [13], 2D IR and 2D Raman correlation, as well as IR-Raman hetero-correlation studies of ๏ข-lactoglobulin [14-16], 2D

CE

NIR and 2D IR studies of human serum albumin [17, 18], and 2D IR study of ๏ก-lactoalbumin [19] to name a few. The encyclopedia chapter compiled in 2014 with her log-time colleague Young

AC

Mee Jung [20] provides the excellent review on various aspects of applications of vibrational 2DCOS analyses for proteins. This article, dedicated to the memory of late Boguska, provides a tutorial on the basic 2DCOS technique with emphasis on the protein research applications along with additional discussions on some newly introduced techniques relevant to the field. Simulated IR spectra for a model protein are used as an illustrative example to show the utility of standard 2DCOS analysis. Potentially useful tools, such as moving window analysis, codistribution analysis, Pareto scaling, and null space projection, are explained.

ACCEPTED MANUSCRIPT 4

2. Generalized 2D Correlation Spectroscopy (2DCOS) 2.1 Constructing 2D correlation spectra In vibrational 2DCOS study of proteins, a series of systematically evolving IR or NIR absorption or Raman scattering spectra are measured under the influence of some external perturbation applied to the sample. Physical, chemical or biological perturbations, such as rising temperature, effect of electrolyte concentrations, or even simple aging of the sample, can induce changes in the

T

molecular state of proteins, such as conformational changes of the secondary structures,

IP

irreversible denaturation, or formation of aggregates and adsorbates. Those molecular changes,

CR

in turn, result in the variation of spectral features. A form of cross correlation analysis is applied to the spectra thus corrected to extract pertinent information about the sample system [1-3].

US

Suppose a discretely sampled set of m vibrational spectra ๐ด(๐œˆ๐‘˜ , ๐‘ก๐‘– ) is observed for a system under the influence of an external perturbation, which induces systematic changes in the spectral

AN

intensities. The spectral variable ๐œˆ๐‘˜ with ๐‘˜ = 1, 2, โ€ฆ ๐‘› may be the IR wavenumber sampled over n discrete measurement points. The other variable ๐‘ก๐‘– with ๐‘– = 1, 2, โ€ฆ ๐‘š represents the

M

quantitative measure of the effect of the applied perturbation, e.g., temperature, additive concentration, or incubation time. Only the sequentially sampled spectral dataset observed within

ED

the explicitly defined interval between ๐‘ก1 and ๐‘ก๐‘š will be used for the 2D correlation analysis. Here, the perturbation variable is assumed to be evenly sampled with a constant increment, i.e.,

PT

๐‘ก๐‘–+1 โˆ’ ๐‘ก๐‘– = constant. A procedure for handling unevenly sampled cases often encountered in real world is described elsewhere [21].

CE

Dynamic spectra ๐ดฬƒ(๐œˆ๐‘˜ , ๐‘ก๐‘– ) used in 2DCOS analysis are defined as ฬ… ๐ดฬƒ(๐œˆ๐‘˜ , ๐‘ก๐‘– ) = {๐ด(๐œˆ๐‘˜ , ๐‘ก๐‘– ) โˆ’ ๐ด(๐œˆ๐‘˜ ) ๐‘“๐‘œ๐‘Ÿ 1 โ‰ค ๐‘– โ‰ค ๐‘š 0 ๐‘œ๐‘กโ„Ž๐‘’๐‘Ÿ๐‘ค๐‘–๐‘ ๐‘’

(1)

AC

๐ดฬ…(๐œˆ๐‘˜ ) is the spectrum representing the reference state of the system. In the absence of a priori knowledge of the reference state, the reference spectrum is usually set to be the averaged spectrum over the observation interval between ๐‘ก1 and ๐‘ก๐‘š given by 1 ๐ดฬ…(๐œˆ๐‘˜ ) = ๐‘š โˆ‘๐‘š ๐‘–=1 ๐ด(๐œˆ๐‘˜ , ๐‘ก๐‘– )

(2)

With this specific choice of the reference spectrum, the portion of dynamic spectra within the observation interval becomes equivalent to the mean-centered spectra.

Synchronous and

asynchronous 2D correlation spectra, ฮฆ(๐œˆ1 , ๐œˆ2 ) and ฮจ(๐œˆ1 , ๐œˆ2 ), are given by 1

ฬƒ ฬƒ ฮฆ(๐œˆ1 , ๐œˆ2 ) = ๐‘šโˆ’1 โˆ‘๐‘š ๐‘–=1 ๐ด(๐œˆ1 , ๐‘ก๐‘– ) โˆ™ ๐ด(๐œˆ2 , ๐‘ก๐‘– )

(3)

ACCEPTED MANUSCRIPT 5

1 ๐‘š ฬƒ ฬƒ ฮจ(๐œˆ1 , ๐œˆ2 ) = ๐‘šโˆ’1 โˆ‘๐‘š ๐‘–=1 ๐ด(๐œˆ1 , ๐‘ก๐‘– ) โˆ™ โˆ‘๐‘—=1 ๐‘๐‘–๐‘— ๐ด(๐œˆ2 , ๐‘ก๐‘— )

(4)

The term ๐‘๐‘–๐‘— is the element of the so-called Hilbert-Noda transformation matrix [2, 3] which is defined by 0 ๐‘๐‘–๐‘— = {

1 ๐œ‹(๐‘—โˆ’๐‘–)

๐‘“๐‘œ๐‘Ÿ ๐‘– = ๐‘— (5)

๐‘œ๐‘กโ„Ž๐‘’๐‘Ÿ๐‘ค๐‘–๐‘ ๐‘’

The synchronous and asynchronous

IP

out of phase with) the original t-dependent spectra.

T

Hilbert transformation yields a new set of spectra which are varying quadrature to (i.e., 90 degrees

CR

correlation intensities formally correspond to the real and imaginary part of the complex cross correlation function calculated along the variable t between two spectral signal variations measured at wavenumbers ๐œˆ1 and๐œˆ2 . More simply said, the synchronous correlation intensity ฮฆ(๐œˆ1 , ๐œˆ2 )

US

represents the overall similarity or coincidental nature between the two signal variations. The

sequentially varying nature of the signals.

M

2.2 Properties of 2D correlation spectra

AN

asynchronous correlation intensity ฮจ(๐œˆ1 , ๐œˆ2 ) , in contrast, represents the out-of-phase or

ED

The intensity of a synchronous 2D correlation spectrum ฮฆ(๐œˆ1 , ๐œˆ2 ) represents the simultaneous or coincidental changes of spectral intensity variations measured at ๏ฎ1 and ๏ฎ2 within the observation

PT

interval along the externally defined variable t. A synchronous 2D spectrum is a symmetric spectrum with respect to the main diagonal line corresponding to coordinates ๐œˆ1 = ๐œˆ2 .

CE

Correlation peaks appear both at diagonal and off-diagonal positions. The intensity of peaks located at diagonal positions mathematically corresponds to the autocorrelation function

AC

(equivalent to the statistical variance) of spectral intensity variations. The diagonal peaks are therefore referred to as autopeaks. An autopeak represents the overall susceptibility of the spectral signal to change intensity when an external perturbation is applied to the system. Cross peaks located at the off-diagonal positions of a synchronous 2D spectrum represent simultaneous or coincidental changes of two different intensity signals observed at coordinates ๐œˆ1 and๐œˆ2 . Such a synchronized change, in turn, suggests the possible existence of a coupled or related origin of the signal variations. While the sign of an autopeak is always positive, the sign of a cross peak can be either positive or negative. For convenience, negative peaks (troughs) are indicated by shading. The sign of synchronous cross peaks becomes positive if the two spectral

ACCEPTED MANUSCRIPT 6

signals measured at the wavenumbers ๏ฎ1 and ๏ฎ2 are either increasing or decreasing together in the same direction as functions of the external variable t. On the other hand, the appearance of a negative cross peak indicates that one of the signals is increasing while the other is decreasing. An asynchronous spectrum ฮจ(๐œˆ1 , ๐œˆ2 ) represents the out-of-phase or sequential changes of spectral intensities along the perturbation variable. An asynchronous spectrum has no diagonal peak, consisting exclusively of cross peaks located at off-diagonal positions. An asynchronous

IP

T

cross peak develops only if the intensities of two signals measured at ๐œˆ1 and ๐œˆ2 change out of phase (i.e., delayed or accelerated) with each other. This feature becomes especially useful in

CR

discriminating overlapped bands arising from signals of different physical origins. For example, different signal contributions from individual components of a complex mixture, chemical

US

functional groups experiencing different effects from some external field, or inhomogeneous materials comprised of multiple phases, may all be effectively discriminated. Even if spectral

AN

features are located close to each other, as long as the signatures or the pattern of signal intensity variations along the external variable t are substantially different, asynchronous cross peaks will

M

develop between their corresponding coordinates.

One can also determine the sequential order of spectral intensity variations from the signs of

ED

cross peaks. If the signs of ฮฆ(๐œˆ1 , ๐œˆ2 ) and ฮจ(๐œˆ1 , ๐œˆ2 ) are the same, the overall spectral intensity variation observed at ๐œˆ1 predominantly occurs prior to that at ๐œˆ2 . If the signs are different, the If we have ฮจ(๐œˆ1 , ๐œˆ2 ) = 0 , the variations of spectral intensities at two

PT

order is reversed.

wavenumbers, ๐œˆ1 and ๐œˆ2 are completely synchronized. Finally, if we have ฮฆ(๐œˆ1 , ๐œˆ2 ) = 0, the

CE

sequential order of intensity variation cannot be determined. It is important to emphasize that 2DCOS analysis only gives the order of spectral intensity variations but not the order of the

AC

presence of particular species contributing to the spectral signals during the observation.

3 2DCOS analysis of a model protein 3.1 Simulated model protein IR spectra A large body of literature exists today on the 2DCOS analysis of proteins, polypeptides, and other related molecules.

The review chapter compiled by Czarnik-Matusewicz and Jung [20]

extensively discusses 2DCOS analysis of proteins in the mid-IR region. In this tutorial, we use a set of illustrative simulated IR spectra for a generic model protein sample. The use of such a model system has a specific advantage of knowing a priori the intended spectral features without

ACCEPTED MANUSCRIPT 7

experimental ambiguity to demonstrate the utility of 2DCOS techniques. Once the performance of 2DCOS is validated, it can then be applied to other real systems with confidence. The model protein system is assumed to have four major contributions from distinct conformational secondary structure, i.e., ๏ก-helices, ๏ข-sheets, turns, and disordered or random coils. The ๏ก-helix conformer is assumed to have an IR absorption peak centered around 1653 cm-1, ๏ข-sheet around 1682 and

T

1614 cm-1, turn around 1670 cm-1, and disordered or random coil around 1644 cm-1. These band

IP

positions are selected essentially arbitrarily, but some effort was made to match to the proximity

AC

CE

PT

ED

M

AN

US

CR

of known band positions of Amide I IR absorptions of typical proteins.

Figure 1 Simulated IR spectra in the Amide I region (A) of four component conformational secondary structures comprising the assumed model protein, and their population change profiles (B) induced by the rising temperature.

The simulated spectra of individual pure secondary conformational structures are shown in Figure 1A. We now consider changes in the population of individual conformational secondary structures upon heating from 40 oC to 90 oC, as illustrated in Figure 1B. The sample is assumed to be initially comprising predominantly with the ๏ก-helix structure, which is converted to other

ACCEPTED MANUSCRIPT 8

forms by the thermally induced denaturation. With increasing temperature to a moderate level, the turn conformation is produced first, followed by the increase in ๏ข-sheet population at a higher temperature. All these secondary structures will eventually become disordered random coil at a much higher temperature above 90 oC. It should be pointed out that the above model assumes the profiles (i.e., position, width and shape) of individual band contributions are not affected by temperature. Only the intensity of

IP

T

each band is affected. This assertion may surprise some readers, as the so-called peak positon shift or width change is a very common observation in the temperature dependent spectral analysis.

CR

In fact, 2D correlation spectroscopy is a very sensitive probe for detecting the existence of such band profile changes [3, 23]. When there is a true change in band profiles, very characteristic

US

unusual patterns of cross peak clusters must appear in 2D correlation spectra. On the other hand, if the apparent profile changes arise from the effect of highly overlapped bands reflecting

AN

population changes, no such complex patterns should be observed. To this authorโ€™s knowledge, 2D correlation analysis of a large number of experimentally observed spectra, including those of

M

proteins, almost always points to the overlapped bands instead of the actual profile changes [2325]. Thus, the simulation based on constant band profile assumption may well be justified.

ED

With this model system, it is now possible to generate a series of IR spectra for a protein sample undergoing the conformational changes with increasing temperature from 40 oC to 90 oC. Figure

PT

2A shows the temperature-dependent IR spectra in the Amide I region of the model protein. Spectral features corresponding to the individual conformational secondary structures are noted.

CE

However, they are also heavily overlapped and not easily resolved with simple visual inspection. The corresponding dynamic spectra (Figure 2B) are calculated using Equation 1 with the

AC

temperature-average spectrum as the reference using Equation 2.

ACCEPTED MANUSCRIPT

AN

US

CR

IP

T

9

Figure 2 Simulated temperature-dependent IR spectra in the Amide I region of a model protein

ED

M

(A), and corresponding dynamic spectra (B) during the heating from 40 oC to 90 oC.

PT

3.2 2DCOS analysis of model protein IR spectra Contour maps of the synchronous and asynchronous 2D IR correlation spectra of the model

CE

protein system undergoing the thermal denaturation are shown in Figure 3. They are constructed from the temperature-dependent dynamic spectra (Figure 2B) using Equations 3โ€“5. Temperature-

AC

averaged spectrum is displayed as the reference at the top and left side of each 2D correlation map. Negative correlation intensity areas are indicated by shading. By spreading the spectral features along the second dimension, the apparent spectral resolution is visibly much enhanced. Individual contributions from different conformational structures are clearly separated with the development of distinct auto- and cross peaks.

ACCEPTED MANUSCRIPT

CR

IP

T

10

US

Figure 3 Contour map representations of the synchronous (A) and asynchronous (B) 2D IR correlation spectra obtained from the temperature-dependent simulated IR spectra of a model

AN

protein in Figure 2. Shaded areas indicate negative correlation intensity regions.

M

Autopeaks are observed in the synchronous spectrum (Figure 3A) near the expected band positions of contributions of individual conformational structures, indicating that their intensities

ED

are varying during the course of the temperature change. A pair of strong negative synchronous cross peaks appear between the spectral coordinates of decreasing ๏ก-helix and increasing random

PT

coil contributions. Two bands assigned to ๏ข-sheet conformation have a positive cross peak pair, as their intensities vary in the same direction with increasing temperature. Relative directions of

CE

intensity changes for other band pairs become more ambiguous. For example, the synchronous correlation intensity between ๏ข-sheet and turn conformations is much weaker, indicating that their

AC

spectral intensity variations are highly asynchronous.

Furthermore, since these two bands

experience both increase and decrease in their intensities during the thermal treatment, declaring the relative directions of intensity changes between them for the entire observation interval may not be appropriate. A method for resolving such competing overlapped responses by segmenting the dataset will be discussed later. Cross peaks in the asynchronous 2D correlation spectrum (Figure 3B) reveal obvious differences among the patterns of intensity changes of contributing IR bands. Essentially, all bands belonging to different conformational structures develop clear asynchronous cross peaks with those from other structures. Thus, it can be verified that the intensity variation pattern of ๏ก-

ACCEPTED MANUSCRIPT 11

helix is substantially different from that of ๏ข-sheet or turn conformations. Likewise, the pattern for random coil variation is different from that for turn or ๏ข-sheet structure.

This highly

advantageous discriminating feature of asynchronous cross peaks is observed even among highly overlapped neighboring bands, such as ๏ก-helix band and nearby turn contribution. The obvious lack of asynchronous cross peaks between the two ๏ข-sheet bands again validate

T

that the dynamics of bands from the same species are fully synchronized. Interestingly, the

IP

overall intensity variations between ๏ก-helix and random coil are somewhat similar with both bands showing the steady changes throughout the temperature range. However, these contributions are

CR

easily discriminated from the negative synchronous cross peaks for the intensity changes in the opposite directions. From the signs of both synchronous and asynchronous cross peaks, one can

US

conclude that the intensity changes of turn contribution occurs earlier, i.e., at a lower temperature, compared to that of ๏ข-sheet. Determination of the sequential order of intensity variations becomes

AN

easier by segmenting the dataset into smaller temperature blocks as discussed below.

4.1 Moving window 2DCOS analysis

M

4. Data segmentation for less ambiguous interpretation

ED

As already indicated above, there are situations with several competing or overlapped responses, e.g., creation and subsequent consumption of the turn and ๏ข-sheet conformational structures

PT

observed within a broad temperature span. In that case, unambiguous determination of the relative directions and sequential order of intensity changes may become difficult. Splitting the

CE

dataset into smaller segments, covering only a narrower range focused on a particular process instead of the entire observation interval, greatly simplifies the analysis [26, 27]. However, such

AC

data segmentation can sometimes be subjective or somewhat arbitrary. A more objective method to select the appropriate range of segmented dataset is desired. Thomas and Richardson [28] proposed a systematic approach, known as autocorrelation moving-window two-dimensional (MW2D) method, which later became a very useful tool for the objective segmentation of dataset. A small block of dataset called window is selected first for the standard 2D correlation analysis.

The power spectrum, corresponding to the correlation

intensities at the main diagonal of the synchronous spectrum is then obtained from the windowed data block. By incrementing the position of the window along the perturbation axis, a waterfall plot of localized power spectrum as a function of the window position along the perturbation

ACCEPTED MANUSCRIPT 12

variable is obtained. Autocorrelation MW2D map thus obtained is used to effectively identify the specific region of pronounced spectral intensity variations along the perturbation axis. Standard 2D correlation analysis is then applied to the select regions thus identified for unambiguous interpretation. Szwed et al. [29], for example, used MW2D analysis to study the temperaturedependent IR behavior of bilayer membrane system to detect the conformational changes of constituents.

In PCMW2D, spectral

IP

correlation moving-window two-dimensional (PCMW2D) analysis.

T

Morita et al. [30] proposed an alternative form of moving window analysis called perturbation-

CR

intensity variation within a small window at a given wavenumber are correlated against perturbation variable itself, instead of the intensity variations at other wavenumbers. Again the PCMW2D analysis yields the

US

window position is incremented to yield waterfall plots.

synchronous and asynchronous maps, respectively, useful for the identification of large spectral

AN

intensity variation ranges and their spreads. Specific mathematical procedures for autocorrelation MW2D and PCMW2D analyses are described below.

M

For a dataset ๐ด(๐œˆ๐‘˜ , ๐‘ก๐‘– ) consisting of m spectral traces spread over the perturbation variable t ranging between t1 and tm, we define ๐‘š โˆ’ 2๐‘Ÿ dataset blocks called windows, each consisting of The center value of the

ED

2๐‘Ÿ + 1 consecutive spectra selected from the original dataset.

perturbation variable t of the window is set to be ๐‘ = ๐‘ก๐‘ž where index q for each window position

PT

is incremented from ๐‘Ÿ + 1 to ๐‘š โˆ’ ๐‘Ÿ. The local average spectrum ๐ดฬ…(๐œˆ๐‘˜ , ๐‘) and local average perturbation variable ๐‘ฬ…(๐‘) of the window around the center value p of the window are given by

1

CE

1 ๐ดฬ…(๐œˆ๐‘˜ , ๐‘) = 2๐‘Ÿ+1 โˆ‘๐‘ž+๐‘Ÿ ๐‘–=๐‘žโˆ’๐‘Ÿ ๐ด(๐œˆ๐‘˜ , ๐‘ก๐‘– )

๐‘ฬ…(๐‘) = 2๐‘Ÿ+1 โˆ‘๐‘ž+๐‘Ÿ ๐‘–=๐‘žโˆ’๐‘Ÿ ๐‘ก๐‘–

(6) (7)

AC

The local dynamic spectra ๐ดฬƒ(๐œˆ๐‘˜ , ๐‘ก๐‘– ) and local dynamic perturbation variable ๐‘ฬƒ(๐‘ก๐‘– ) are given by ๐ดฬƒ(๐œˆ๐‘˜ , ๐‘ก๐‘– ) = ๐ด(๐œˆ๐‘˜ , ๐‘ก๐‘– ) โˆ’ ๐ดฬ…(๐œˆ๐‘˜ , ๐‘)

(8)

๐‘ฬƒ(๐‘ก๐‘– ) = ๐‘ก๐‘– โˆ’ ๐‘ฬ…(๐‘)

(9)

We now apply 2D correlation analysis to the local set of dynamic spectra within the window. The autocorrelation MW2D spectrum ฮฉฮ‘ (๐œˆ๐‘˜ , ๐‘) at the perturbation variable ๐‘ = ๐‘ก๐‘ž is given by 1

๐‘ž+๐‘Ÿ

ฮฉฮ‘ (๐œˆ๐‘˜ , ๐‘) = 2๐‘Ÿ+1 โˆ‘๐‘–=๐‘žโˆ’๐‘Ÿ ๐ดฬƒ(๐œˆ๐‘˜ , ๐‘ก๐‘– )2

(10)

This one dimensional spectrum corresponds to the slice of the synchronous 2D spectrum of the window along the main diagonal line, i.e., autopower spectrum or the autocorrelation function of

ACCEPTED MANUSCRIPT 13

the spectral intensity variations. In short, ฮฉฮ‘ (๐œˆ๐‘˜ , ๐‘) represents the measure of the localized extent of spectral intensity variations around the perturbation variable p of the window. For a relatively small window size, it has been shown that autocorrelation MW2D spectrum is essentially proportional to the square of the slope of ๐ด(๐œˆ๐‘˜ , ๐‘ก๐‘– ) with respect to the perturbation variable. ฮฉฮ‘ (๐œˆ๐‘˜ , ๐‘) โˆ [

๐œ•๐ด(๐œˆ๐‘˜ ,๐‘ก) 2 ๐œ•๐‘ก

]

(11)

๐‘ก=๐‘

T

The synchronous and asynchronous PCMW2D spectra, ฮฉฮฆ (๐œˆ๐‘˜ , ๐‘) and ฮฉฮจ (๐œˆ๐‘˜ , ๐‘) , are

IP

obtained by correlating the localized spectral intensity variations to the change in the perturbation

CR

variable itself within the window. They are obtained by 1

ฬƒ ฮฉฮฆ (๐œˆ๐‘˜ , ๐‘) = 2๐‘Ÿ+2 โˆ‘๐‘ž+๐‘Ÿ ฬƒ(๐‘ก๐‘– ) ๐‘–=๐‘žโˆ’๐‘Ÿ ๐ด(๐œˆ๐‘˜ , ๐‘ก๐‘– ) โˆ™ ๐‘

US

1 ๐‘ž+๐‘Ÿ ฬƒ ฮฉฮจ (๐œˆ๐‘˜ , ๐‘) = 2๐‘Ÿ+2 โˆ‘๐‘ž+๐‘Ÿ ฬƒ(๐‘ก๐‘— ) ๐‘–=๐‘žโˆ’๐‘Ÿ ๐ด(๐œˆ๐‘˜ , ๐‘ก๐‘– ) โˆ™ โˆ‘๐‘—=๐‘žโˆ’๐‘Ÿ ๐‘๐‘–๐‘— ๐‘

(12) (13)

The element of the Hilbert-Noda transformation matrix ๐‘๐‘–๐‘— is given by Equation 5.

The

AN

synchronous PCMW2D spectrum is a measure of how closely the spectral intensity variation is following the perturbation variable. The asynchronous spectrum, in turn, represent the out-of-

M

phase or delayed response of spectral intensity variation with respect to the perturbation. Like the autocorrelation MW2D case, PCMW2D spectra for windows with a small size are related to the

ED

derivatives of the spectral intensity variations with respect to the perturbation variable. The synchronous spectrum becomes proportional to the first derivative, while the asynchronous ๐œ•๐ด(๐œˆ๐‘˜ ,๐‘ก) ๐œ•๐‘ก

]

๐‘ก=๐‘

CE

ฮฉฮฆ (๐œˆ๐‘˜ , ๐‘) โˆ [

PT

spectrum is proportional to the negative second derivative.

ฮฉฮจ (๐œˆ๐‘˜ , ๐‘) โˆ โˆ’ [

๐œ•2 ๐ด(๐œˆ๐‘˜ ,๐‘ก) ๐œ•๐‘ก 2

]

๐‘ก=๐‘

(14) (15)

AC

Based on these properties, Jung et al. [31] proposed even a simpler approach called gradient mapping, which directly constructs the waterfall plots of the first and second derivatives, as an effective substitutions for the PCMW2D analysis. Figure 4 shows the example of an autocorrelation MW2D spectrum and synchronous and asynchronous PCMW2D spectra constructed from the simulated IR spectra (Figure 2) of the model protein. Three major temperature regions of interest, i.e., 40 to 60 oC, 60 to 75 oC, and 75 to 90 o

C, are identified from the plot. While the autocorrelation MW2D spectrum adequately identifies

the temperature ranges of interest, the synchronous PCMW2D spectrum provides additional information about the increasing and decreasing trends of the spectral intensities. Peak positions

ACCEPTED MANUSCRIPT 14

in the asynchronous PCMW2D spectrum reflect the inflection points of the spectral intensity changes. They are useful in defining the tighter boundaries of the select temperature ranges. We use the segmented dataset based on the identified temperature ranges of interest for further 2D

AN

US

CR

IP

T

correlation analysis.

M

Figure 4 Autocorrelation MW2D (A), synchronous PCMW2D (B), and asynchronous PCMW2D

ED

(C) spectra of the simulated model protein IR spectra in Figure 2A.

4.2 2DCOS analysis of segmented dataset

PT

With the identification of appropriate temperature ranges representing individual spectral intensity variation dynamics, 2DCOS analysis may be applied to smaller segmented datasets leading to

CE

substantially less ambiguous interpretation. Figure 5 shows the 2D correlation spectra of the model protein within the temperature range between 40 oC and 60 oC. In this range, spectral

AC

intensities are all increasing with the single exception of the ๏ก-helix band. Positive synchronous cross peaks except for those at 1653 cm-1 support this fact. The sequential order of spectral intensity variations is now unambiguously determined. The intensity of the band for the turn conformation changes ahead of every other bands at a lower temperature. The increase in the ๏ขsheet occurs after the growth of the turn but before the continuing disappearance of ๏ก-helix.

ACCEPTED MANUSCRIPT

CR

IP

T

15

US

Figure 5 Synchronous (A) and asynchronous (B) 2D IR correlation spectra of simulated model

AN

protein constructed from the select temperature range from 40 oC to 60 oC.

2D correlation spectra for the temperature range between 60 oC and 75 oC is shown in Figure

M

6. Intensities of ๏ข-sheet bands around 1682 and 1614 cm-1 start increasing in this temperature

ED

range, which is indicated by the presence of negative synchronous cross peaks correlated with the ๏ก-helix band at 1653 cm-1 which is decreasing in the intensity.

Comparison of sings of

PT

synchronous and asynchronous cross peaks reveals that the ๏ข-sheet growth occurs before the increase in disordered random coil around 1644 cm-1 and further continuation of the disappearance

CE

of ๏ก-helix conformations. In this temperature range, spectral intensity of the turn conformation around 1670 cm-1 has started decreasing with temperature, but the decrease is obviously lagging

AC

behind the steady decrease in ๏ก-helix already in progress from the beginning.

Positive

synchronous and asynchronous cross peaks between the two bands at (1653, 1670) validate this fact. Increase in the disordered random coil population is lagging behind changes of all other bands with increasing temperature.

ACCEPTED MANUSCRIPT

CR

IP

T

16

US

Figure 6 Synchronous (A) and asynchronous (B) 2D IR correlation spectra of simulated model

AC

CE

PT

ED

M

AN

protein constructed from the select temperature range from 60 oC to 75 oC.

Figure 7 Synchronous (A) and asynchronous (B) 2D IR correlation spectra of simulated model protein constructed from the select temperature range from 75 oC to 90 oC. 2D correlation spectra for the final temperature range between 75 and 90 oC are shown in Figure 7. In this temperature block of spectral dataset, all band intensities are decreasing except for the disordered random coil contribution at 1644 cm-1, which produces a series of negative

ACCEPTED MANUSCRIPT 17

synchronous cross peaks along this coordinate. Cross peak signs indicate the decrease in the intensity of ๏ข-sheet bands at 1682 and 1614 cm-1 occurs later at a higher temperature after that for the turn conformation at 1670 cm-1. The depreciation of the turn and ๏ข-sheet structures continue well into the higher temperature range, lagging behind the earlier depreciation of ๏ก-helix at 1653 cm-1 accompanied by the accumulation of random coil at 1644 cm-1, which have already been

T

largely completed at a lower temperature.

IP

As demonstrated in this section, appropriate segmentation of spectral dataset may significantly improve our ability to interpret 2D correlation spectra reflecting complex dynamics of spectral Instead of constructing 2D correlation spectra using the entire range of

CR

intensity changes.

perturbation variable like temperature (Figure 3), smaller blocks of dataset were used to generate

US

a series of 2D correlation spectra (Figures 5-7) for the more focused analysis depicting individual spectral intensity evolution dynamics. Moving window or gradient map analysis is a convenient

AN

tool to objectively select the specific range of dataset for such focused 2DCOS study.

M

5 Two-dimensional codistribution spectroscopy (2DCDS) 2DCOS analysis is specifically designed to sort out the sequential order of spectral intensity

ED

variations along the perturbation variable. As such, it may not be the best tool to determine the order of the presence of particular species distributed within the observation interval. However,

PT

evolving distribution dynamics of constituent species may be the more desired information. This is especially true in the analysis of various reaction systems involving intermediate species

CE

appearing and disappearing during the observed process along with the consumption of the reactants initially present and formation of the final products. Two-dimensional codistribution

AC

spectroscopy (2DCDS) introduced in 2014 [32, 33] is a useful tool for this type of analysis. Asynchronous 2D codistribution spectrum โˆ†(๐œˆ1 , ๐œˆ2 ) is defined as ฮค(๐œˆ ,๐œˆ )

๐ดฬƒ(๐œˆ2 ,๐‘ก๐‘– ) ๐ดฬ…(๐œˆ2 )

1 2 โˆ‘๐‘š โˆ†(๐œˆ1 , ๐œˆ2 ) = ๐‘š(๐‘šโˆ’1) ๐‘–=1 ๐‘– {

โˆ’

๐ดฬƒ(๐œˆ1 ,๐‘ก๐‘– ) } ๐ดฬ…(๐œˆ1 )

(16)

The dynamic spectrum ๐ดฬƒ(๐œˆ๐‘˜ , ๐‘ก๐‘– ) and average spectrum ๐ดฬ…(๐œˆ๐‘˜ ) are obtained as before using Equations 1 and 2. The total joint variance ฮค(๐œˆ1 , ๐œˆ2 ) is given by ฮค(๐œˆ1 , ๐œˆ2 ) = โˆšฮฆ(๐œˆ1 , ๐œˆ1 ) โˆ™ ฮฆ(๐œˆ2 , ๐œˆ2 )

(17)

By convention, the value of โˆ†(๐œˆ1 , ๐œˆ2 ) is set to become zero if the condition of ๐ดฬ…(๐œˆ1 ) = 0 or ๐ดฬ…(๐œˆ2 ) = 0 is encountered.

ACCEPTED MANUSCRIPT 18

Asynchronous codistribution intensity is a measure of the difference in the distribution of two spectral signals along the perturbation variable axis based on the well-known mathematical tool of moment analysis [32, 33]. The interpretation of asynchronous codistribution spectrum for the presence of spectral signals distributed along the perturbation variable is straightforward. For a cross peak with positive sign, i.e., ฮ”(๐œˆ1 , ๐œˆ2 ) > 0 , the spectral signal at ๐œˆ1 is distributed predominantly at the earlier stage along the perturbation variable t compared to that for ๐œˆ2 . In

IP

T

contrast, if ฮ”(๐œˆ1 , ๐œˆ2 ) < 0 , the order is reversed. In the case of ฮ”(๐œˆ1 , ๐œˆ2 ) โ‰ˆ 0 , the average distributions of the spectral signals observed at two wavenumbers over the course observation

CR

interval are similar.

Figure 8 shows the asynchronous 2D codistribution spectrum generated from the simulated IR

US

spectra of the model protein taken from Figure 2 using Equations 16 and 17. Negative intensity codistribution cross peaks are indicated by shading, and temperature-averaged spectrum is

AN

provided as the reference at the top and left side of the codistribution map. As in the case of asynchronous 2D correlation spectrum (Figure 3B), 2D codistribution spectrum is also an

M

antisymmetric map with respect to the main diagonal line, and no peak appears on the main diagonal line. While 2DCOS and 2DCDS spectra may even look somewhat similar in their

ED

appearance, they actually carry very different types of information. The vertical series of three positive cross peaks located at the position of ๐œˆ1 = 1653 cm-1

PT

reveals that ๏ก-helix has already been present at the early stage of the spectral evolution, i.e., at a lower temperature range. In contrast, only negative cross peaks indicated by shading are observed

CE

at 1644 cm-1. Disordered random coil conformations evolves into existence in much later stage of the heating process. The presence of a strong cross peak at the spectral coordinate (1653, 1644)

AC

in the upper portion of the 2DCDS spectrum clearly suggests that ๏ก-helix is in existence before the creation of random coil. Band positions of 1682, 1670 and 1614 cm-1 exhibit both positive and negative 2DCDS cross peaks, indicating that their appearance occurs in-between, sometime after the initial presence of ๏ก-helix but before the later appearance of random coil upon heating. Based on such observation, one may even infer a possible mechanism that some of the ๏ก-helix conformation is transformed to the turn or ๏ข-sheet structure before fully denatured to the disordered state. The negative cross peak at (1682, 1670) suggests that turn conformation is formed at a lower temperature than ๏ข-sheet. There is no cross peak at (1682, 1714), because their intensities

ACCEPTED MANUSCRIPT 19

develop at the same time. It is often an indication that such synchronously distributed bands arise

AN

US

CR

IP

T

from the same species.

M

Figure 8 Asynchronous 2D codistribution spectrum generated from the simulated IR spectra of

ED

a model protein (Figure 2). Shaded areas indicate negative correlation intensity regions.

The interest in the use of 2DCDS analysis is increasing especially in the field of biomolecules

PT

study. For example, Ramer and Ashton [34] successfully applied 2DCDS analysis to the Raman optical activity and UV resonance Raman studies of thermally and chemically induced

CE

conformational transformations of polypeptide and polynucleotide. We anticipate the substantial

AC

growth of 2DCDS activities in protein analysis.

6. Feature enhancement techniques for 2DCOS In the recent years, several useful data pretreatment techniques have been developed to enhance the performance of 2D correlation spectra. Some of them have been discussed in the recent review article [32]. In this section, two specific techniques effective in the protein research, i.e., Pareto scaling and null space projection (NSP), are examined.

6.1 Pareto scaling

ACCEPTED MANUSCRIPT 20

The extent of spectral intensity variations induced by a perturbation differs greatly from band to band. Some spectral responses are very strong, while others may be much weaker but still important. If there is a large discrepancy in the extent of responses of neighboring bands, weaker signals may be obscured by the more dominant contributions. In the contour mapping display used in 2DCOS, peaks are often scaled according to the strongest correlation intensity point of the entire spectrum. Such scaling is well suited for displaying peaks with intermediate level of

T

correlation intensities. However, weaker correlation peaks may become less obvious, especially

IP

if there are a few very strong correlation peaks present in the same map.

CR

Scaling operation is sometimes applied to the dataset prior to 2D correlation analysis to attenuate the effect of strongly varying signals [35, 36]. There have been several attempt to utilize

US

the so-called Pearson unit-variance scaling, also known as the auto-scaling, to totally eliminate the effect of the magnitude of spectral intensity variations to extract purely correlative information In unit-variance scaling, dynamic spectrum ๐ดฬƒ(๐œˆ๐‘˜ , ๐‘ก๐‘– ) is divided by the standard

AN

[32, 36].

deviation ๐œŽ(๐œˆ๐‘˜ ) = โˆšฮฆ(๐œˆ๐‘˜ , ๐‘˜) before 2D correlation analysis. Cross correlation intensity used in

M

standard 2DCOS is replaced by the Pearsonโ€™s correlation coefficient value bound between โ€“1 and 1. It turned out that such a treatment is not a very useful tool in 2D correlation analysis. Cross

ED

peaks with clear demarcation are changed to continuously connected regions of squares without discernible boundaries. Autopeaks, which are useful in identifying the location and extent of

PT

intensity variations, are completely eliminated from the synchronous spectrum. Finally, it was also realized that even a small level of noise may be disproportionately enlarged particularly in the

CE

spectral region with relatively low level of signals. Due to such obvious shortcomings, unitvariance scaling technique has no longer been used much in the 2DCOS field.

AC

Pareto scaling [35] is a pragmatic compromise, which eliminate the deficiencies of unitvariance scaling, while mostly achieving the desired effect of reducing the interfering effect of a strongly intensity contributions.

In the Pareto-scaled 2D correlation spectra, fine features

previously obscured by the large variations of neighboring bands become much more visible. To apply Pareto scaling, dynamic spectra are divided by the square root of the standard deviation. This simple operation leads to the Pareto-scaled 2D correlation spectra given by ฮฆ(๐œˆ1 , ๐œˆ2 )๐‘ƒ๐‘Ž๐‘Ÿ๐‘’๐‘ก๐‘œ = ฮฆ(๐œˆ1 , ๐œˆ2 )/โˆšฮค(๐œˆ1 , ๐œˆ2 )

(18)

ฮจ(๐œˆ1 , ๐œˆ2 )๐‘ƒ๐‘Ž๐‘Ÿ๐‘’๐‘ก๐‘œ = ฮจ(๐œˆ1 , ๐œˆ2 )/โˆšฮค(๐œˆ1 , ๐œˆ2 )

(19)

ACCEPTED MANUSCRIPT 21

The total joint variance ฮค(๐œˆ1 , ๐œˆ2 ) is given by Equation 17. As the signs of cross peaks are not at all affected by the Pareto scaling, one can still use the same cross peak sign rules to determine the sequential order of the spectral intensity variations. Szyk et al. [37], for example, effectively utilized Pareto scaling to enhance the weak 2D IR spectral features observed for the 2-pridone isomerization. This technique can also be broadly used in the protein study. Figure 9 shows the Pareto-scaled 2D correlation spectra in contrast to the unscaled version

T

(Figure 3). Subtle features associated with the ๏ข-sheet and turn conformations, which were

IP

obscured by the strong contributions from the significant intensity changes of ๏ก-helix and

CR

disordered conformations, have become much more visible. Both auto- and cross peaks for the ๏ข-sheet and turn are obviously more pronounced. The negative synchronous correlation peaks

US

between the ๏ก-helix and ๏ข-sheet, barely discernible in Figure 3A without Pareto scaling, are now clearly identified. It is interesting, however, to point out that most of the distinct features of

AN

individual contributions from different conformational structures are already adequately captured in the asynchronous spectrum (Figure 3B) even without Pareto scaling. Primary advantage of the

M

Pareto scaling operation seems to lie in the more detailed identification of fine features for the

AC

CE

PT

ED

synchronous correlation, which is critical in determining the sequential order of intensity changes.

Figure 9

Pareto-scaled synchronous (A) and asynchronous (B) 2D IR correlation spectra

obtained from the temperature-dependent simulated IR spectra of a model protein in Figure 2. Shaded areas indicate negative correlation intensity regions.

ACCEPTED MANUSCRIPT 22

6.2 Null-space projection (NSP) 2D correlation analysis can highlight and sort our complex dynamics of overlapped spectral bands which are responding differently to a given perturbation. By spreading individual bands along the second dimension, spectral resolution is markedly improved and many more features often become noticeable. This seemingly favorable advantage also may bring out some

T

unexpected difficulty, as 2D correlation spectra may become highly congested with too many

IP

cross peaks. Interpretation of 2D correlation spectra which are overly rich in detailed

CR

information can be a challenge. A technique which can selectively eliminate certain portions of 2D correlation spectra for more streamlined interpretation is discussed here.

US

Null-space projection (NSP) is a data manipulation technique based on the vector space projection methodology introduced to the 2DCOS field [32, 38]. NSP simplifies congested 2D

AN

spectra with too many cross peaks by selectively and systematically eliminating unwanted interfering features. In turn, the remaining features previously obscured by such interference

M

become more clearly visible in the NSP-treated 2D spectra. While NSP is often implemented by employing the vector-matrix notation, we use here a simpler notation without matrices. In

ED

NSP, a certain variational pattern along the perturbation variable axis, represented as the projecting variable ๐‘ฆ(๐‘ก), is eliminated from the dataset by a mathematical projection operation.

PT

While the feature to be eliminated could be selected from many different types of signals, it is often chosen from a specific spectral intensity variation pattern at a select wavenumber ๐œˆ๐‘ก๐‘Ž๐‘Ÿ๐‘”๐‘’๐‘ก

CE

already captured within the original spectral dataset, i.e., ๐‘ฆ(๐‘ก๐‘– ) = ๐ดฬƒ(๐œˆ๐‘ก๐‘Ž๐‘Ÿ๐‘”๐‘’๐‘ก , ๐‘ก๐‘– ) of the dynamic spectra ๐ดฬƒ(๐œˆ๐‘˜ , ๐‘ก๐‘– ). Such a signal may represent the spectral contribution of a particular species

AC

that is obscuring the features of the rest by overcrowding the 2D spectra. The NSP operation is carried out in the following steps: First the feature to be eliminated is selected as the projecting variable ๐‘ฆ(๐‘ก๐‘– ). The so-called projected portion ๐ดฬƒ๐‘ (๐œˆ๐‘˜ , ๐‘ก๐‘– ) of the dynamic spectra is then given by ๐ดฬƒ๐‘ (๐œˆ๐‘˜ , ๐‘ก๐‘– ) =

๐‘ฆ(๐‘ก๐‘– ) 2 ๐‘š โˆ‘๐‘—=1 ๐‘ฆ(๐‘ก๐‘— )

ฬƒ โˆ‘๐‘š ๐‘—=1 ๐‘ฆ(๐‘ก๐‘— ) โˆ™ ๐ด(๐œˆ๐‘˜ , ๐‘ก๐‘— )

(20)

Mathematically, ๐ดฬƒ๐‘ (๐œˆ๐‘˜ , ๐‘ก๐‘– ) corresponds to the part of ๐ดฬƒ(๐œˆ๐‘˜ , ๐‘ก๐‘— ) projected onto the vector space spanned by the projecting variable ๐‘ฆ(๐‘ก๐‘– ). It simply means that only the portion of ๐ดฬƒ(๐œˆ๐‘˜ , ๐‘ก๐‘— ) for a given ๐œˆ๐‘˜ which is strictly proportional to ๐‘ฆ(๐‘ก๐‘– ) is retained in ๐ดฬƒ๐‘ (๐œˆ๐‘˜ , ๐‘ก๐‘– ). All other variational

ACCEPTED MANUSCRIPT 23

features of spectral intensities originally present in ๐ดฬƒ(๐œˆ๐‘˜ , ๐‘ก๐‘— ) are discarded. Now we apply an additional operation called positive projection trick to ๐ดฬƒ๐‘ (๐œˆ๐‘˜ , ๐‘ก๐‘– ) to obtain a new dataset ๐ดฬƒ๐‘+ (๐œˆ๐‘˜ , ๐‘ก๐‘– ). In this operation, any part of ๐ดฬƒ๐‘ (๐œˆ๐‘˜ , ๐‘ก๐‘– ) for given ๐œˆ๐‘˜ which does not satisfy the ฬƒ condition โˆ‘๐‘š ๐‘—=1 ๐ด๐‘ (๐œˆ๐‘˜ , ๐‘ก๐‘— ) โˆ™ ๐‘ฆ(๐‘ก๐‘— ) > 0 is replaced with zero. The positive projection assures that each component of ๐ดฬƒ๐‘+ (๐œˆ๐‘˜ , ๐‘ก๐‘– ) is not only proportional to the projector ๐‘ฆ(๐‘ก๐‘– ) in the size but

T

also pointing to the same direction in the vector space. Finally, the null-space projected dataset

IP

๐ดฬƒ๐‘›๐‘ ๐‘ (๐œˆ๐‘˜ , ๐‘ก๐‘– ) is given by

(21)

CR

๐ดฬƒ๐‘›๐‘ ๐‘ (๐œˆ๐‘˜ , ๐‘ก๐‘– ) = ๐ดฬƒ(๐œˆ๐‘˜ , ๐‘ก๐‘— ) โˆ’ ๐ดฬƒ๐‘+ (๐œˆ๐‘˜ , ๐‘ก๐‘– )

2DCOS analyses using ๐ดฬƒ๐‘›๐‘ ๐‘ (๐œˆ๐‘˜ , ๐‘ก๐‘– ) in Equations 3 and 4 instead of the original dynamic spectra

US

๐ดฬƒ(๐œˆ๐‘˜ , ๐‘ก๐‘— ) yield a new of 2D spectra devoid of any features represented by the projecting variable ๐‘ฆ(๐‘ก๐‘– ).

AN

Figure 10 shows the NSP-treated 2D correlation IR spectra of the model protein. In this example, spectral intensity variations observed at 1670 cm-1 is selected as the projector, which is This specific band

M

used for the NSP treatment of the original dataset found in Figure 2.

correspond to the contribution from the turn conformation with the characteristic dynamics of band

ED

intensity, which originally increases up to 60 oC and then decreases with temperature rise as shown in Figure 1B. By applying the NSP, entire dynamic spectral features showing a pattern similar to

PT

that of the behavior of the 1670 cm-1 band are eliminated. The resulting 2D correlations spectra depict exclusively the contributions from the rest of conformational structures without any

CE

contribution from the turn conformation. Comparing to the original 2D correlation spectra of the same protein system without NSP

AC

treatment (Figure 3), features of the NSP-treated 2D spectra (Figure 10) are obviously much simplified, and peak positions are more accurately identified. For example, the removal of the interfering contribution from the turn conformation at 1670 cm-1 provides much cleaner depiction of the dynamics of neighboring ๏ข-sheet band at 1682 cm-1. The intensity of this ๏ข-sheet band increases up to about 75 oC and then decreases with temperature rise (Figure 1B). The relative similarity of the temperature dependent dynamics of intensity changes and proximity of the band positions makes the differentiation of the ๏ข-sheet and turn contributions more difficult. NSPtreatment makes it possible to observe the relative behavior of pure ๏ข-sheet band with respect to the rest of the system.

ACCEPTED MANUSCRIPT

US

CR

IP

T

24

Figure 10 Synchronous (A) and asynchronous (B) 2D IR correlation spectra obtained from the

AN

temperature-dependent simulated IR spectra of a model protein after the NSP treatment using the intensity variations of the turn component at 1670 cm-1 as the projector. Shaded areas represent

ED

M

negative intensity.

Obviously, intensity dynamics of bands other than the turn conformation at 1670 cm-1 may be

PT

used for the NSP operation to remove contributions from different conformational structures. For example, by selecting the band intensity variation at 1682 or 1614 cm-1 as the projector for NSP,

CE

it is possible to remove the contribution from ๏ข-sheet to construct a new set of 2D spectra. Likewise, contributions from the ๏ก-helix or disordered conformational structures may be

AC

selectively removed from the analysis. It is also possible to apply NSP treatment more than once to remove contributions from multiple constituents. Finally, the NSP projector does not have to be selected from the spectral intensity variations. External perturbations, e.g., temperature or concentration, may be used to remove the direct effect arising from them on spectral intensity changes. NSP is one of the most promising recent developments in the field of 2DCOS analysis.

7 Conclusions 2DCOS analysis is a powerful and versatile technique applicable to a broad range of spectroscopic applications, especially for the study of complex systems like proteins.

The

ACCEPTED MANUSCRIPT 25

construction of 2D correlation spectra described in this tutorial is relatively straightforward. One only needs a series of systematically varying spectra generated by applying an external perturbation like temperature to the sample during the measurement. The set of spectra are then converted to the synchronous and asynchronous 2D correlation spectra representing, respectively, the similarity and dissimilarity of intensity variations. The spectral resolution is enhanced by spreading the overlapped bands along the second dimension. 2D correlation spectra provide rich

T

information about the presence of coordinated or independent changes among spectral signals.

CR

simple set of rules applied to the signs of cross peaks.

IP

Relative directions and sequential order of spectral intensity variations may be deduced using a

Additional techniques useful in the 2DCOS analysis of proteins are also discussed. If the

US

dataset covers too broad a range of perturbation variable, segmentation to smaller blocks of data with narrower ranges of observation interval greatly simplifies the interpretation of 2DCOS results.

AN

Objective dataset segmentation is effectively assisted by the use of moving window correlation techniques.

For the determination of the sequential order of the distributed presence of

M

intermediate species during the observed process interval, newly introduced 2DCDS analysis may be a tool of the choice. Pareto scaling is a selective attenuation technique to bring out subtle but

ED

important features sometimes obscured by the presence of a few very strong peaks dominating in the 2D correlation spectra. Null space projection is a powerful technique to eliminate particular

PT

spectral intensity variation patterns, such as those arising from the contribution from select species, to simplify the interpretation of congested 2D correlation spectra. These enhancement techniques

AC

References

CE

should further assist the application of 2DCOS analysis in the study proteins and related systems.

[1] I. Noda, Generalized two-dimensional correlation method applicable to infrared, Raman, and other types of spectroscopy, Appl. Spectrosc. 47 (9) (1993) 1329-1336. [2] I. Noda, A.E. Dowrey, C. Marcott, G.M. Story, and Y. Ozaki, Generalized two-dimensional correlation spectroscopy, Appl. Spectrosc., 54 (7) (2000) 236A-248A. [3] I. Noda and Y. Ozaki, Two-Dimensional Correlation Spectroscopy. Applications in Vibrational and Optical Spectroscopy, John Wiley & Sons, Chichester, 2004. [4] I. Noda, Generalized two-dimensional correlation spectroscopy, in: J. Laane (Ed.), Frontiers of Molecular Spectroscopy, Elsevier, Amsterdam, 2009, pp. 367-381.

ACCEPTED MANUSCRIPT 26

[5] I. Noda, Progress in 2D correlation spectroscopy, in Y. Ozaki and I. Noda (Eds.) TwoDimensional Correlation Spectroscopy, AIP Press, Melville, 2000, pp.3-17. [6] I. Noda, Advances in two-dimensional correlation spectroscopy, Vibr. Spectrosc. 36 (2) (2004) 143-165. [7] I. Noda, Progress in two-dimensional (2D) correlation spectroscopy, J. Molec. Struct. 799 (13) (2006) 2-15.

T

[8] I. Noda, Recent advancement in the field of two-dimensional correlation spectroscopy, J. Mol.

IP

Struct. 883-884 (2008) 2-26.

CR

[9] I. Noda, Two-dimensional correlation spectroscopy โ€” biannual survey 2007 โ€“ 2009, J. Mol. Struct. 974 (1-3) (2010) 3-24.

US

[10] I. Noda, Frontiers of two-dimensional correlation spectroscopy. Part 1. New concepts and noteworthy developments, J. Molec. Struct. 1069 (2014) 3-22.

AN

[11] I. Noda, Frontiers of two-dimensional correlation spectroscopy. Part 2. Perturbation methods, fields of applications, and types of analytical probes, J. Molec. Struct. 1069 (2014) 23-49.

M

[12] Y. Park, I. Noda, Y.M. Jung, Novel developemts and applications of two-dimensional correlation spectroscopy, J. Mol. Struct. 1124 (2016) 11-28.

ED

[13] B. Czarnik-Matusewicz, K. Murayama, R. Tsenkova, Y. Ozaki, Analysis of near-infrared spectra of complicated biological fluids by two-dimensional correlation spectroscopy: protein and

PT

fat concentration-dependent spectral changes of milk, Appl. Spectrosc. 53 (1999) 1582- 1594. [14] B. Czarnik-Matusewicz, K. Murayama, Y. Wu, Y. Ozaki, Two-dimensional attenuated total

CE

reflection/infrared correlation spectroscopy of adsorption-induced and concentration-dependent spectral variations of ๏ข-lactoglobulin in aqueous solutions, J. Phys. Chem. 104 (2000) 7803-7811.

AC

[15] B. Czarnik-Matusewicz, S.B. Kim, Y.M. Jung, A study of urea-dependent denaturation of ๏ข-lactoglobulin by principal component analysis and two-dimensional correlation spectroscopy, J. Phys. Chem. B 113 (2009) 559-566. [16] Y.M. Jung, B. Czarnik-Matusewicz, Y. Ozaki, Two-dimensional infrared, two-dimensional Raman, and two-dimensional infrared and Raman heteroscpectral correlation studies of secondary structure of ๏ข-lactoglobulin in buffer solutions, J. Phys. Chem. B 104 (2000) 78127817. [17] K. Murayama, B. Czarnik-Matusewicz, Y. Wu, R. Tsenkova, Y. Ozaki, Comparison between conventional spectral analysis methods, chemometrics, and two-dimensional correlation

ACCEPTED MANUSCRIPT 27

spectroscopy in the analysis of near-infrared spectra of protein, Appl. Spectrosc. 54 (2000) 978985. [18] Y. Wu, K. Murayama, B. Czarnik-Matusewicz, Y. Ozaki, Two-dimensional attenuated total reflection/infrared correlation spectroscopy studies on concentration and heat-induced structural changes on human serum albumin in aqueous solution, Appl. Spectrosc. 56 (2002) 1186-1193. [19] A. Litwiล„czuk, S.R. Ryu, L.A. Nafie, J.W. Lee, H.I. Kim, Y.M. Jung, B. Czarnik-Matusewicz,

T

The transition from the native to the acid-state characterized by multi-spectroscopy approach:

IP

study for the holo-form of bovine ๏ก-lactalbumin, Biochim. Biophys. Acta 1844 (2014) 593-606.

CR

[20] B. Czarnik-Matusewicz, Y.M. Jung, Two-dimensional mid-infrared correlation spectroscopy in p0rotein research, in Optical Spectroscopy and Computational Methods in

US

Biology and Medicine, M. Baranska, Ed., pp.213-250, Springer, Dordrecht, 2014. [21] I. Noda, Two-dimensional correlation analysis of enevenly spaced spectral data, Appl.

AN

Spectrosc., 57 (8) (2003) 1049-1051.

[22] I. Noda, Determination of two-dimensional correlation spectra using the Hilbert transform,

M

Appl. Spectrosc., 54 (7) (2000) 994-999.

[23] S.R. Ryu, I. Noda, C.-H. Lee, P.H. Lee, H. Hwang, Y.M. Jung, Two-dimensional correlation

ED

analysis and watrerfall plots for detecting positional fluctuations of spectral changes, Appl. Spectrosc. 65 (4) (2011) 359-368.

PT

[24] M. Unger, B. Harnacke, I. Noda, H.W. Siesler, Solvent interactions in methanol/N,Ndimethylamide binary systems studied by Fourier transform infrared-attenuated total reflection

CE

(FT-IR/ATR) and two-dimensional correlation spectroscopy, Appl. Spectrosc. 65 (8) (2011) 892900.

AC

[25] M.A. Czarnecki, Frequency shift or intensity shift? The origin of spectral changes in vibrational spectra, Vib. Spectrosc. 58 (1) (2012) 193-198. [26] D. Kang, S.R. Ryu, Y. Park, B. Czarnik-Matusewicz, Y.M. Jung, pH-induced structural changes of ovalbumin studied by 2D correlation IR spectroscopy, J. Mol. Struct. 1069 (2014) 299-304. [27] B. Pastrana-Rios, M. Reyes, J. De Orbeta, V. Meza, D. Narvรกez, A.M. Gรณmez, A.R. Nassif, R. Almodovar, A.D. Casas, Relative stability of human centrins and its relationship to calcium binding, J. Robles, A.M. Ortiz, L. Irizarry, M. Campbell, M. Colรณn, Biochemistry 52 (2013) 1236-1248.

ACCEPTED MANUSCRIPT 28

[28] M. Thomas, H. Richardson, Two-dimensional FT-IR correlation analysis of the phase transitions in a liquid crystal, 4โ€˜-n-octyl-4-cyanobiphenyl (8CB), Vib. Spectrosc. 24 (2) (2000) 137-146. [29] J. Szwed, K. Cieล›lik-Boczula, B. Czarnik-Matusewicz, A. Jaszczyszyn, K. Gฤ…siorowski, P. ลšwiฤ…tek, W. Maliknka, Moving-window 2D correation spectroscopy in studies of fluphenazineDPPC dehydrated film as a function of temperature, J. Mol. Struct. 974 (2010) 192-202.

T

[30] S. Morita, H. Shinzawa, I. Noda, Y. Ozaki, Perturbation-correlation moving-window two-

IP

dimensional correlation spectroscopy, Appl. Spectrosc. 60 (4) (2006) 398-406.

CR

[31] Y.M. Jung, H.S. Shin, S.B. Kim, I. Noda, Two-dimensioonal gradient mapping technique useful for detailed spectral analysis of polymer transition temperatures, J. Phys. Chem. B 112 (12)

US

(20089) 3611-3616.

[32] I. Noda, Techniques useful in two-dimensional correlation and codistribution spectroscopy

AN

(2DCOS and 2DCDS) analyses, J. Mol. Struct. 1124 (2016) 29-41. [33] I. Noda, Two-dimensional codistribution spectroscopy to determine the sequential order of

M

distributed presence of species, J. Molec. Struct. 1069 (2014) 50-59. [34] F. Ramer, L. Ashton, Two-dimensional codistribution spectroscopy applied to UVRR and

ED

ROA investigations of biomolecular transitions, J. Mol. Struct. 1124 (2016) 173-179. [35] I. Noda, Scaling techniques to enhance two-dimensional correlation spectra, J. Mol. Struct.

PT

883-884 (2008) 216-227.

[36] F.E. Barton II, D.S. Himmersbach, J. H. Duckworth, M.J. Smith, Two-dimensional vibration

429.

CE

spectroscopy: correlation of mid- and near-infrared regions, Appl. Spectrosc. 46 (3) (1992) 420-

AC

[37] L. Szyc, J. Guo, M. Yang, J. Dreyer, P.M. Tolstoy, E.T.J. Nibbering, B. CzarnikMatusewicz, T. Elsaesser, H.-H. Limbach, The hydrogen-bonded 2-pyridone dimer model system. 1. Combined NMR and FT-IR spectroscopy study, J. Phys. Chem. A 114 (2010) 77497760. [38] I. Noda, Projection two-dimensional correlation analysis, J. Mol. Struct. 974 (1-3) (2010) 116126.

ACCEPTED MANUSCRIPT 29

Figure captions Figure 1 Simulated IR spectra in the Amide I region (A) of four component conformational secondary structures comprising the assumed model protein, and their population change profiles (B) induced by the rising temperature. Figure 2 Simulated temperature-dependent IR spectra in the Amide I region of a model protein

IP

T

(A), and corresponding dynamic spectra (B) during the heating from 40 oC to 90 oC.

CR

Figure 3 Contour map representations of the synchronous (A) and asynchronous (B) 2D IR correlation spectra obtained from the temperature-dependent simulated IR spectra of a model

US

protein in Figure 2. Shaded areas indicate negative correlation intensity regions. Figure 4 Autocorrelation MW2D (A), synchronous PCMW2D (B), and asynchronous PCMW2D

AN

(C) spectra of the simulated model protein IR spectra in Figure 2A. Figure 5 Synchronous (A) and asynchronous (B) 2D IR correlation spectra of simulated model

M

protein constructed from the select temperature range from 40 oC to 60 oC.

ED

Figure 6 Synchronous (A) and asynchronous (B) 2D IR correlation spectra of simulated model

PT

protein constructed from the select temperature range from 60 oC to 75 oC. Figure 7 Synchronous (A) and asynchronous (B) 2D IR correlation spectra of simulated model

CE

protein constructed from the select temperature range from 75 oC to 90 oC. Figure 8 Asynchronous 2D codistribution spectrum generated from the simulated IR spectra of

Figure 9

AC

a model protein (Figure 2). Shaded areas indicate negative correlation intensity regions. Pareto-scaled synchronous (A) and asynchronous (B) 2D IR correlation spectra

obtained from the temperature-dependent simulated IR spectra of a model protein in Figure 2. Shaded areas indicate negative correlation intensity regions. Figure 10 Synchronous (A) and asynchronous (B) 2D IR correlation spectra obtained from the temperature-dependent simulated IR spectra of a model protein after the NSP treatment using the

ACCEPTED MANUSCRIPT 30

intensity variations of the turn component at 1670 cm-1 as the projector. Shaded areas represent

AC

CE

PT

ED

M

AN

US

CR

IP

T

negative intensity.

ACCEPTED MANUSCRIPT 31

AC

CE

PT

ED

M

AN

US

CR

IP

T

Graphical Abstract

ACCEPTED MANUSCRIPT 32

AC

CE

PT

ED

M

AN

US

CR

IP

T

Highlights ๏‚ท A tutorial for the two-dimensional correlation spectroscopy (2DCOS) which is applicable to a broad area, including the vibrational spectroscopic study of proteins and related systems ๏‚ท Step-by-step procedures for constructing and interpreting vibrational 2D correlation spectra ๏‚ท Additional tools useful in the 2DCOS analysis of proteins, such as data segmentation assisted with the moving-window analysis, 2D codistribution analysis, Pareto scaling, and null-space projection.