Chemical Physics 298 (2004) 17–28 www.elsevier.com/locate/chemphys
Relationship between structure and optical properties in green fluorescent proteins: a quantum mechanical study of the chromophore environment Teodoro Laino, Riccardo Nifosı, Valentina Tozzini
*
NEST-INFM and Scuola Normale Superiore, Piazza dei Cavalieri 7, I-56126 Pisa, Italy Received 2 July 2003; accepted 24 October 2003
Abstract The present paper reports the first quantum mechanical modeling of a realistic chromophore environment of the green fluorescent proteins (GFPs). Based on density functional theory (DFT) and semiempirical calculation, we studied the effect of each amino acid in close contact with the chromophore and derived a quantitative and predictive relationship between structure and optical properties. On the basis of this relationship, the structural, optical and vibrational properties of the different states of wildtype GFP and of two mutants, EGFP (F64L/S65T) and E2 GFP (F64L/S65T/T203Y), are then specifically studied. This approach can be applied to infer some structural features of spectroscopic states for which no structural data is available, such as the dark states involved in GFP photodynamics. Ó 2003 Elsevier B.V. All rights reserved. Keywords: Green fluorescent proteins; Density functional theory; Semiempirical calculations
1. Introduction The wild-type green fluorescent protein (WT-GFP) and its mutants form a family of fluorescent tags largely used in cell biology [1]. The autocatalytic formation of the chromophore and the virtual absence of interference with the marked protein in fusion tags have determined in the last decades a huge development of the technologies based on GFPs. Consequently, a large amount of mutants have been produced, with different optical properties [2]. In spite of this, a clear relationship between structure and photophysical properties has not been established, yet, preventing a rational tailoring of mutants to specific purposes. The 4-(p)-hydroxybenzylidene-imidazolidin-5-one chromophore is formed by a cyclization–dehydration– oxidation of the three amino acids at positions 65–66–67
*
Corresponding author. Tel.: +39-050-509433; fax: +39-050-509417. E-mail address:
[email protected] (V. Tozzini).
0301-0104/$ - see front matter Ó 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.chemphys.2003.10.040
(Ser–Tyr–Gly in WT-GFP) and is buried inside the highly protective b-barrel fold [1,3,4]. Within the protein, the chromophore is present in two protonation forms [2,3,5,6], corresponding to the two absorption peaks of the protein spectrum. Namely, a neutral form carrying a phenolic Tyr66 is associated with the shortwavelength absorption (called A, abs. at 398 nm in WT), while an anionic form carrying a phenolate Tyr66 is associated with the long-wavelength peak (called B, abs. at 471 nm in WT) [3,4]. Several femtosecondresolved spectroscopic studies have shed light on the fluorescence mechanism of GFP [5,7,8]. The green (482 nm) fluorescence following excitation of B stems from direct decay of the excited B state. By contrast, the fluorescence resulting from excitation of A occurs via excited state proton transfer through a hydrogenbond network from the chromophore to residue Glu222 (see Fig. 1), leading to an intermediate anionic I state emitting at 506 nm (in WT). State A is then recovered from I by a ground-state inverse proton transfer from Glu222 to the chromophore [5,6]. As a consequence, A and I are thought to be structurally similar apart from
18
T. Laino et al. / Chemical Physics 298 (2004) 17–28
Fig. 1. Schematic representation of the largest models for the active sites of EGFP state A (a) and state B (b) and E2 GFP state A (c) and state B (d). Cut bonds are indicated by wavy lines. Dotted lines represent hydrogen bonds, as obtained by simulations.
the protonation of the chromophore and of the other residues involved in the process [4]. Conversely, B has a different structure of the chromophore environment, where the chromophore and Glu222 are not connected by a hydrogen-bond network. More recent studies have confirmed this picture, although revealing a rather complex landscape of variants in the GFP family. Mutations occurring in the chromophore or chromophore environment can modify the relative population of A and B states and their absorption/emission wavelengths up to covering almost all the visible spectrum from blue to yellow [2]. Some empirical rules can be extracted from a detailed taxonomy of the over 100 mutants existing so far: for instance, mutation of Thr203 red shifts the absorption/ emission of B state, especially if the new residue is aromatic [11], while mutation of Tyr66 into Histidine or Tryptophan produces a blue shift [12]. These effects have been attributed to changes in the extension of the delocalized electron cloud or changes in the polarizability of the chromophore environment [13,14]. A few theoretical works based on quantum-chemistry calculations on the chromophore have rationalized the effects of the chemical changes and of hydrogen bonds on the chromophore [14,15]. However, a precise understanding of which interactions are responsible for the blue and especially red shifts in the absorption/ emission is still lacking, since no calculations were performed on the chromophore embedded in a realistic protein environment.
Changes in the relative stability between A/I/B states were mostly attributed to the stabilizing effect of hydrogen bonds on the phenolate of the chromophore in B state [11,16,17], or to the destabilization of the hydrogen-bond network connecting the chromophore with Glu222 in A state, produced for instance by mutation S65T [4]. In our recent paper [18] we modeled the structure and dynamics of the two mutants EGFP (F64L/S65T) and E2 GFP (F64L/S65T/T203Y) showing that the A/B equilibrium involves a different internal dynamics with conformational changes of residues Glu222 and Thr65. Possible structures of I were proposed [13,19] although little is known about this intermediate, apart from its absorption energy in a few mutants [20]. The first focus of this work is on the relationship between the mutations in the active site (i.e. the chromophore environment) and the excitation energies. We address this task by a theoretical study of the structural, electronic and dynamical properties of the chromophore embedded in a realistic protein environment, i.e. in presence of all the close residues. The second part of this work is focused on the properties of the ‘‘bright states’’ (i.e. A, B, I) of the above mentioned EGFP and E2 GFP mutants. The former is a representative of the ‘‘green’’ class and the latter of the ‘‘yellow’’ class. They differ by just one mutation, i.e. T203Y, so that the comparison is particularly straightforward. Results on WT-GFP are also given for completeness.
T. Laino et al. / Chemical Physics 298 (2004) 17–28
2. Computational methods and model systems Optimized geometries, electronic structures and vibrational properties were calculated within the density functional theory (DFT) framework, using the Becke– Lee–Yang–Parr exchange and correlation functional [21]. Valence electron orbitals were expanded in plane waves with a cutoff of 60 Ryd, and Troullier–Martins pseudopotentials were used to implicitly treat the core electrons [22,23]. Orthorhombic simulation boxes were employed with cell dimensions varying from 30 of empty to 40 a.u., in order to leave at least 5 A space between periodic images. Harmonic vibrational frequencies and eigenmodes were calculated by numerical differentiation using finite displacements of 0.01 a.u. Car–Parrinello [24] molecular dynamics simulations were performed using the electronic mass preconditioning scheme [25] with a time step of 0.193 fs (8 time atomic units). The code CPMD3.4.3 [24,26] was used for the Car–Parrinello and DFT calculations. The excitation energies of the DFT optimized geometries were evaluated within a semiempirical approach (MNDO) using MNDOC [27] parameters, superior to MNDO for this purpose [28]. The electronic wave functions are of configuration interaction (CI) type, with molecular orbitals obtained in a self-consistent field (SCF) calculation with fractional occupation numbers [29]. We used complete active space (CAS) CI wavefunctions with six electrons in six molecular orbitals ð6; 6Þ in order to include all those relevant for the S0 ! S1 transition, and extending the floating occupation number treatment to 12 electron in 12 orbitals. Calculations were performed with a development version of MOPAC2000 [30]. We analyzed several different model systems. The largest models for the active site are schematically represented in Fig. 1, and include all the functional groups of residues in direct interaction (hydrogen bond, aromatic or steric) with the chromophore. We cut only single bonds (indicated by wavy lines in Fig. 1) which were saturated with hydrogen atoms and the heavy atoms at the boundary were kept fixed during structural optimizations and molecular dynamics. The initial positions were taken from thermalized classical molecular dynamics models in our previous work [18].
3. Calculations on the ‘‘training set’’ models In order to monitor the effect of each residue, we performed calculations and simulations on restricted models including, beyond the chromophore, only a few functional groups at a time. These models, listed and described in Table 1, constitute a ‘‘training set’’: on the basis of their calculated properties we derived a relationship
19
Table 1 Numbering, description and excitation energies of the ‘‘training set’’ models #
Residues included in the model
DFT (eV)
CI (eV) (O.S.)
1 2 3 4 5 6 7 8 9 10
NC NC Tyr203 NC W1 W3 NC Arg96 Gln94 NC Arg96 Gln94 W1 W3 AC AC Tyr203 AC W1 Tyr145 His148 Thr65 AC Arg96 Gln94 AC Arg96 Gln94 W1 Tyr145 His148 Thr65
2.23 2.18 2.19 2.07 2.02 1.81 1.81 1.92 1.71 1.82
3.519 3.517 3.462 3.219 3.103 2.698 2.619 2.707 1.969 2.258
(1.21) (1.12) (1.22) (1.19) (1.28) (1.45) (1.36) (1.31) (0.0005) (0.04)
Legend: NC, neutral chromophore; AC, anionic chromophore. In models 8 and 10 residues His148 and Tyr145 are represented by a water molecule. The column DFT contains the energy gap between HOMO and LUMO calculated within DFT. The CI column contains the S0 –S1 excitation energy, obtained within the CAS-CI approach. The values are in eV. Oscillator strengths are given in parentheses.
between structure and excitation energy which we subsequently applied to the complete active-site models. 3.1. Optimized geometries and excitation energies The excitation energies and the chromophore bond lengths of the ‘‘training set’’ models are reported in Tables 1 and 2. The DFT column in Table 1 reports the energy difference between highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) calculated in the DFT frame, while the CI column reports the excitation energy and oscillator strength of the S0 –S1 transition calculated within the CAS-CI approach. The CI excitation energies for the neutral and anionic isolated chromophore (NC and AC) are in good agreement with other theoretical values reported in the literature [14]. Also in agreement with previous calculations is the composition of the electronic excitation, which is predominantly HOMO–LUMO [31] except for models 9 and 10, whose prevalent component is HOMO–LUMO+ (i.e. the second lowest unoccupied orbital), though the transition remains always of p–p type. As it is well known and apparent from the comparison of the two columns, the DFT gap is a rough underestimate (about 50% on average) of the excitation energy [32], although the DFT calculation reproduces correctly the trend of variation of the excitation energy. From the analysis of the training models, some indications can be derived on the effect of each interacting residue. NC has the largest excitation energy, meaning that all the interactions with the neighbor residues tend to red shift its absorption. The red shift produced by the two hydrogen-bond donors Arg96 and Gln94 can be estimated around 30–40 nm (compare model 4 with 1
20
T. Laino et al. / Chemical Physics 298 (2004) 17–28
Table 2 Bond lengths of the chromophore in the ‘‘training set’’ models optimized by DFT calculations. Each column corresponds to a bond distance, numbered as shown in the chemical scheme below the table (left). Each row corresponds to a different model numbered as in Table 1. Below the table the two resonant forms of the anionic chromophore are shown: benzenoid (left) and quinonoid (right). In the last row the experimental data for the neutral crystallized chromophore are reported [49]. The errors with respect to the corresponding model 1 are 1% on average Model
Bond 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1 2 3 4 5 6 7 8 9 10
1.237 1.236 1.237 1.272 1.277 1.252 1.256 1.246 1.313 1.291
1.495 1.498 1.483 1.465 1.464 1.463 1.462 1.472 1.432 1.437
1.364 1.375 1.366 1.372 1.378 1.390 1.394 1.377 1.414 1.396
1.449 1.473 1.443 1.433 1.430 1.418 1.412 1.426 1.400 1.401
1.408 1.412 1.411 1.405 1.409 1.413 1.415 1.413 1.409 1.408
1.316 1.323 1.320 1.309 1.313 1.313 1.317 1.317 1.316 1.313
1.399 1.398 1.397 1.418 1.413 1.402 1.397 1.391 1.406 1.399
1.416 1.423 1.422 1.383 1.392 1.426 1.423 1.425 1.394 1.387
1.417 1.417 1.418 1.417 1.423 1.433 1.437 1.424 1.447 1.435
1.390 1.391 1.386 1.386 1.380 1.374 1.371 1.380 1.364 1.367
1.400 1.399 1.406 1.407 1.411 1.451 1.454 1.427 1.461 1.442
1.403 1.400 1.409 1.407 1.413 1.454 1.460 1.428 1.465 1.441
1.391 1.408 1.389 1.380 1.383 1.375 1.375 1.379 1.367 1.366
1.416 1.428 1.418 1.419 1.422 1.434 1.436 1.425 1.444 1.435
1.387 1.397 1.369 1.374 1.356 1.285 1.273 1.323 1.268 1.291
Experimental
1.227
1.476
1.366
1.457
1.418
1.315
1.369
1.380
1.394
1.385
1.374
1.379
1.375
1.401
1.385
and 5 with 3), while the effect of the hydrogen bonds on the phenol oxygen is weaker (5–15 nm, compare model 3 with 1 and 5 with 4). While these effects compare well to experimental absorption in the proteins, the red shift produced by Arg96 and Gln94 on the anionic chromophore (100 nm, compare model 9 with 6 and 10 with 8) is overestimated since these two models display an excitation energy which is unrealistically low with respect to the experimental values in the proteins. A detailed analysis of the electronic structure of these models reveals a very strong charge transfer on Arg96, correlated with an unusual stretching of the C@O bond. This is reflected also in the negligible oscillator strengths of these transitions (Table 1), and may be an artifact of the models due to the positively charged Arg96 being completely unscreened by other residues. As shown in the next section, this effect is absent in more realistic models. The presence of the three bond donors on the phenolate of the anionic chromophore produces a considerable blue shift (80 nm, compare model 10 with 9). This is in agreement with the experimental observation that mutations such as T203V and T203I which eliminate a hydrogen bond on the chromophore phenolate in the B state tend to red shift its absorption energy [2]. Furthermore, the competition between the red shift due to hydrogen-bond donors to the carbonyl oxygen and the blue shift due to the hydrogen-bond donors to the phenolic oxygen in the anionic state could explain the very recent experimental observation that the absorption spectra of the anionic chromophore in the gas phase is very similar to that in the protein [9,10], although the
experimental value of the absorption wavelength (479 nm) differs from our theoretical value of about 4%. Finally, the interaction with Tyr203 produces a red shift of about 15 nm in the case of AC, while the effect is very small for NC. This is in agreement with the observed effect of mutation T203Y which is substantially to red shift only the B state, absorbing at 515 nm in T203Y [11]. This behavior was explained in terms of the change in the electric dipole upon excitation, which would be larger in the anionic state [13]. 3.2. Relationship between structure and optical properties By monitoring the bond-length variations in the various models, we derive here a simple relationship between the excitation energy and the geometrical properties of the chromophore. We followed an analogy with a property of p-conjugated linear oligomers: going from shorter to longer chains, the energy gap decreases, the double bonds lengthen and the single bonds shorten. This results in a direct correlation between energy gap and bond-length alternation (BLA, defined as the difference between the single and double bonds), although the quantitative behavior is still a matter of debate [33,34]. The GFP chromophore is a p-conjugated system, although the presence of the rings makes it a less simple system than linear oligomers. In Fig. 2 the bond lengths of the chromophore in the training models are reported as a function of the excitation energy. As the excitation energy decreases the length of single bonds decreases and that of double bond increases. The behavior can be
T. Laino et al. / Chemical Physics 298 (2004) 17–28
versus excitation energy (in eV). C–O Fig. 2. Bond distances (in A) bonds are reported in the bottom panel, C–C bonds on the bridge connecting the phenolic ring in the central panel, and C–C bonds on the phenolic ring in the top panel. Different distances in each panel are represented with different symbols (filled dots, open dots and crosses). The fitting lines di ¼ ai þ bi Eex;CI are reported (a different style for each distance). The bond numbering of Table 2 is reported near each line.
fairly well fitted by a linear function. Furthermore, the single-/double-bond alternation is present along the chain formed by the bonds 1–2–3–4–(9, 14)–(10, 13)– (11, 12)–15 in spite of the presence of the phenolic ring, provided each couple of the almost identical bonds symmetric with respect to the 4–15 axis (i.e. (9, 14), (10, 13), (11, 12)) is averaged. Conversely, bonds 5, 6, 7 and 8 in the heterocyclic ring display neither BLA nor a regular behavior as a function of the excitation energy (data not shown). In an alternative view, BLA measures the degree of mixing between the benzenoid and the resonant quinonoid form of the chromophore (shown in Table 2). Clearly, the interactions of the chromophore with the environment can drive the resonance towards the quinonoid or the benzenoid form and correspondingly modify the BLA. Such a view rationalizes why bonds 5, 6, 7 and 8, which are unchanged in the two forms, are not correlated to changes in the excitation energy. In order to take into account the chemical difference of the bonds in the chromophore, we defined the BLA parameter as an appropriate linear combination of the bond lengths rather than the simple difference between single- and double-bond lengths as is done in conjugated linear polymers. The coefficients of this linear relation were derived in the following way: we performed a linear fit for each bond length di (i runs over the bonds) as a function of the excitation energy E di ðEÞ ¼ ai þ bi E:
ð1Þ
The parameter a of model J (J running from 1 to 10 in the training set) is then defined as
21
Fig. 3. Bond-length alternation parameter a versus excitation energy. Big dots: training models 1–10 (Table 1). Solid line: linear fit. Small dots: absorption energies and a values derived from experimental works (PDB codes [35] are indicated as references). CRO corresponds to the structure of the crystallized chromophore [49] and to the absorption of the neutral chromophore in water [14]. The structure 1C4F appears twice because its attribution to A or B state was uncertain. The dotted lines describe the evaluation of the excitation energy for the models for EGFP and E2 GFP (states A, B, I).
aJ ¼
1 X diJ ai ; N i6¼5;6;7;8 bi Emax
ð2Þ
where Emax ¼ 3.519 eV is the energy of NC, diJ is the distance of bond i in model J and N ¼ 11 is the number of bonds displaying the linear behavior. a as so defined is the average of the suitably renormalized di that display the linear behavior as a function of the excitation energy. The parameter a of the training models is plotted in Fig. 3 as a function of the excitation energy (big dots, linear correlation coefficient ¼ 0.9739). We used Eq. (2) to calculate a in selected X-ray structures of representative green and yellow mutants, stabilized in A or B states. These values are reported in Fig. 3 as a function of the measured absorption peak energy (small dots labeled by the corresponding PDB code [35]). It should be noted that these data bear a sizable error introduced by the resolution of X-ray structures, by the refinement procedures and by the width of the absorption peaks. Nonetheless they show the same linear correlation between a and the excitation energy, which supports our definition of a. We remark that the relationship we found between the geometric parameter a and the excitation energy allows to predict the excitation energy of a GFP state based only on the geometrical properties of the chromophore, provided the environment is properly modeled. As an example, our relation explains the results obtained in a recent time-dependent DFT calculation of the excitation spectrum of GFP [36]. In this work, the authors perform a minimization of the whole
22
T. Laino et al. / Chemical Physics 298 (2004) 17–28
protein treating the chromophore at the AM1 semiempirical level while using an empirical force–field hamiltonian to describe the surrounding protein matrix. They then calculate the TDDFT spectrum of the isolated minimized chromophore, thereby neglecting the interactions of the electronic cloud with the protein matrix. Nonetheless, the resulting excitation spectrum displays a remarkably good agreement with absorption measurements. This indicates that the relevant part of the protein matrix effect on the excitation energy is accounted for by the distortion of the chromophore structure. 3.3. Correlations between the absorption energies and the vibrational frequencies Since changes in bond length correspond to changes in bond strength, correlations between the excitation energy and the vibrational frequencies should also be expected. This was suggested in a previous theoretical work by some of us [37] as a possible explanation for the observed linear correlation between the excitation energy and a ‘‘fingerprint’’ Raman band located in the 1610–1650 cm1 range [38]. If this is the case, the correlation should be expected mostly for modes that involve stretching of those double and single bonds that enter in the definition of the a parameter. Fig. 4 reports the calculated vibrational frequencies of the two highest modes below 1700 cm1 , that we labeled m1 and m2 . Both of them are bond stretching modes delocalized along the chromophore, although the highest (m1 , open circles in Fig. 4) is mainly localized on C@O bond 1, and the second (m2 filled
Fig. 4. Vibrational frequencies m1 and m2 (in cm1 ) of the ‘‘C@O’’ (open symbols) and ‘‘C@C’’ (filled symbols) modes as a function of the excitation energy (in eV). The circles correspond to the training set models, the triangles to the models for the protein states.
circles) on C@C bond 3 (see Table 1 for bond numbering). The frequencies show a linear dependence on the excitation energy, although the correlation is less good than for a, especially in the higher-energy region, and some degree of mixing between the two modes is observed. The linear behavior is however good, especially for the lower mode in the region between 2.4 and 3.4 eV. Interestingly, according to recent mode assignments based on isotopic labeling [39] and on Raman intensities [40], this lower mode corresponds to the Raman band that displays the linear correlation with the absorption energy [38]. Although displaying an analogous behavior, the higher mode turns out to be Raman inactive, and thus its linear behavior was not observed. In our calculations, the correlation line matches the one from Raman measurements [38] within 3%.
4. The bright states of EGFP and E2 GFP In this section we report the properties of realistic models for the active site of the bright states of EGFP and E2 GFP, namely A and B states. Although our primary aim is to study these two mutants, some results are given on WT-GFP for completeness. The starting structures were optimized by a simulated annealing with Car–Parrinello dynamics [41]. We also modeled the metastable I states of EGFP and E2 GFP. In these mutants the excited state proton transfer is less relevant than in WT-GFP because other mechanisms of relaxation from A are more efficient [42,43]. This means that the population of I will be low [44]. Nonetheless, the absorption of the I state was measured in similar mutants [20,45], so that it is interesting to compare our prediction with the experimental data (see below). In order to start with reliable structures for the I states, we performed a simulation by forcing the first step of proton transfer, i.e. by bringing the involved hydrogen within bonding distance to W1 (step ‘‘i’’ in Fig. 6). We then let the system evolve with the proton restrained to W1. As a result, after about 20 fs of dynamics steps ‘‘ii’’ and ‘‘iii’’ take place almost simultaneously. The system is then driven towards state I and relaxes within 200 fs. Being performed in the ground state, this simulation ignores (at least) the different charge distribution of the chromophore in the excited state. In spite of this, the sequence of events we found is in agreement with the recent classical molecular dynamics simulation for WT-GFP of [47], where effective charges designed to model the excited state proton transfer were used. However we remark that, at this stage, we are interested in the structure of ground state I, which is likely to be little affected by the early events of the proton transfer.
T. Laino et al. / Chemical Physics 298 (2004) 17–28
23
Fig. 5. Optimized structures of states of A, B and I of EGFP and E2 GFP. The residues names are indicated in the structures of A states. Code color: green ¼ C, red ¼ O, blue ¼ N, white ¼ H, magenta ¼ boundary atoms.
4.1. Geometries and excitation energies The optimized structures of the A, B and I states of EGFP and E2 GFP are represented in Fig. 5. The hydrogen-bond lengths of the different models are reported in Table 3 (labeling as in Fig. 1). An accurate comparison with structural data is difficult, since X-ray structures of the same mutant can bear differences due both to experimental resolution and to the refinement
procedure, and furthermore, no data for EGFP and E2 GFP are available. However in Table 3 we compare our results with the structures 1GFL (WT) and 1EMA (S65T) which are often considered the paradigms for state A and B respectively. The hydrogen-bond lengths are reproduced within 5% in the case of state A of WT, where the theoretical and the experimental structure can be directly compared. A slightly worse agreement (within 10% although on average still around 5%) is
24
T. Laino et al. / Chemical Physics 298 (2004) 17–28
Fig. 6. Proton transfers from A to I. i, ii and iii indicate the different proton transfer steps.
obtained by comparing the calculations of B state of WT with the experimental structure of B state of S65T. In general, the topology of the hydrogen-bond network of states A and B is reproduced in our calculations: the hydrogen bond between the chromophore and water W1 and between Glu222 and Ser205 are absent in state B, while a bond between His148 and the chromophore is present. The hydrogen bond between Thr203 and the chromophore is present only in the B states of EGFP and WT-GFP but not in E2 GFP, where Thr203 is absent. This absence has a relevant role on the absorption energy, as already pointed out in Section 3.3 and addressed again below. In our calculations the hydrogenbond network of the I states has a hybrid structure between A and B. The bond lengths of the chromophore are reported in Table 4. From these data, the as can be calculated and the excitation energies can be extrapolated from Eqs. (1) and (2). These values are reported in Table 5 and compared with absorption data. A direct calculation of the excitation energies with the same ð6; 6Þ active space is not expected to give in general as accurate results for the complete active site models as those for the training set models. This is due to the fact that, whereas the CI space used for the calculations is big enough for the small systems of the ‘‘training set’’, it may be not suitable for the larger realistic models. Table 5 reports our results for the direct calculation of EGFP excitation energy. In this case we still obtain a reasonable agreement between the calculated values and those predicted by the linear relation, though for state I the small value of the oscillator strength already hints to an inadequacy of the CAS size. Conversely, in the case of E2 GFP we find that the direct calculation suffers from a systematic overestimation of about 0.1–0.3 eV, which implies a complete failure in predicting the red shift of E2 GFP states with respect to EGFP states. This is probably due to the presence of Tyr203 introducing
several new orbitals, so that the total number of orbitals included in the correlation is not sufficient. One would need to perform the calculations on a larger CI space in order to be sure not to miss any orbital relevant to describe the excitation of the chromophore. Indeed, by increasing the CAS in the CI calculation from ð6; 6Þ to ð8; 12Þ we observe a decrease in the excitation energy of E2 GFP state A from 3.279 to 3.233 eV. However, such a procedure is much less affordable than the simple prediction based on the a parameter, which on the other hand gives very good results. The excitation energies evaluated from a via the linear relation in Fig. 3 (see dotted arrows) agree with the experimental absorption data on average within 1–2% with a maximum discrepancy of 2.5%. Such a good agreement indicates that our models contain all the necessary interacting residues. The absorption wavelengths of WT-GFP and EGFP are very similar, having the same topology of hydrogen-bond contacts and electrostatic interactions. The small differences can be explained as due to different hydrogen-bond strengths. Conversely, the red shift of B state of E2 GFP with respect to EGFP is sizable and is particularly well reproduced in our calculations. The analysis of the previous sections allows us to decompose this red shift in two parts: 10–15 nm can be attributed to the presence of the interaction of the chromophore with Tyr203, the remaining 20–25 nm can be attributed to the removal of a hydrogen bond from the phenolate. The separation of these two contributions allows us to rationalize the absorption measurements performed on variants carrying mutations at position 203. Indeed, the substitution of Thr203 by an aliphatic residue such as Val or Ile red shifts the absorption of the B state by about 24 nm [11]. The insertion of Tyr at position 203 results instead in a red shift of about 33 nm [11]. Interestingly, the emission wavelength is red shifted only by aromatic mutations, so that the removal of the hydrogen-bond donor to the phenolate does not seem to influence very much the emission energy. This is consistent with state I emitting at (almost) the same energy as state B, though lacking a hydrogen bond. During the dynamics of the I state of E2 GFP (see below) we observed a significant occupation of a configuration (I0 in the following) with the hydroxyl hydrogen of Thr65 pointing towards the nitrogen of imidazolidinone and the hydrogen bond between Ser205 and Glu222 broken. In this configuration the hydrogenbond network is more similar to that of B, although the carboxyl hydrogen of Glu222 is still in the anti configuration, as in I. We optimized both I and I0 configurations (only I is shown in Fig. 5). Our predictions for EGFP are compared with the only data available for the 0–0 transition of state I measured with hole-burning spectroscopy [45] in the mutant S65T which closely resembles EGFP. The red
Table 3 Hydrogen-bond distances in optimized models for A, B and I states of EGFP and E2 GFP and WT Model
H-bond Gln94-NH 2 .. . Tyr66-C@O
Tyr66-O .. . W1
His148-NH .. . W1
Asn146-C@O .. . W1
Ser205-OH .. . W1
Thr203-OH .. . Tyr66-O
Tyr145-OH .. . Tyr66-O
His148-NH .. . Tyr66-O
2.759 2.724 2.682
2.883 3.118 2.874
2.690 2.782 2.686
3.160
2.944 2.895 3.022
2.606 2.763 2.632
3.107
(4.073)
3.043 3.236
E GFP A B I I0
2.706 2.582 2.624 2.631
2.951 2.863 3.004 3.115
2.744 2.659 2.579 2.618
3.266
3.062 (3.069) 3.063 2.479
2.647 2.677 2.721 2.865
WT-GFP A B
2.716 2.628
3.052 3.003
2.619 2.746
3.405
2.977 3.145
2.653 2.926
WT exp (1GFL) A 2.67
2.96
2.62
3.29
2.86
2.70
S65T exp (1EMA) B 2.73
3.03
2.70
2.85
2.70
EGFP A B I
Ser205-OH .. . Glu222-O 2.546 2.537
Glu222-O .. . Thr65-OH 2.807 2.764 2.885
2
3.405 2.898
2.609 2.864 (4.586) 3.898
3.282
2.67
(3.475)
(4.60)
2.605
2.619
2.720 2.699
2.74
2.73
2.802
2.85
2.696 2.720 2.781 2.720
T. Laino et al. / Chemical Physics 298 (2004) 17–28
Arg96-NH 2 .. . Tyr66-C@O
2.81
In each column the distance between the indicated donor and acceptor is reported (see Fig. 1 for residue labeling). Very weak bonds are indicated in parentheses. Experimental values for state A of WT-GFP and state B of S65T are reported (PDB codes of the X-ray structures are indicated in parentheses).
25
26
T. Laino et al. / Chemical Physics 298 (2004) 17–28
Table 4 Bond lengths of the chromophore in the optimized models for the bright states of EGFP, E2 GFP and WT-GFP Model
Bond 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
EGFP A B I
1.270 1.272 1.281
1.465 1.459 1.454
1.376 1.389 1.395
1.433 1.425 1.422
1.403 1.406 1.408
1.308 1.326 1.314
1.433 1.407 1.420
1.387 1.404 1.397
1.421 1.435 1.435
1.383 1.375 1.375
1.409 1.442 1.443
1.411 1.443 1.445
1.388 1.375 1.376
1.422 1.435 1.436
1.359 1.299 1.297
E2 GFP A B I I0
1.272 1.286 1.284 1.283
1.464 1.438 1.448 1.448
1.377 1.397 1.397 1.406
1.435 1.406 1.418 1.417
1.402 1.407 1.408 1.408
1.314 1.371 1.318 1.320
1.423 1.408 1.410 1.403
1.392 1.349 1.400 1.411
1.423 1.430 1.438 1.440
1.381 1.371 1.371 1.373
1.409 1.440 1.443 1.450
1.412 1.448 1.449 1.449
1.386 1.368 1.375 1.370
1.424 1.431 1.436 1.439
1.357 1.294 1.293 1.291
WT-GFP A 1.268 B 1.277
1.476 1.449
1.367 1.390
1.442 1.413
1.402 1.412
1.306 1.318
1.414 1.396
1.38 1.398
51.426 1.434
1.387 1.379
1.409 1.435
1.412 1.432
1.392 1.369
1.424 1.430
1.362 1.296
The numbering of the bonds is as in Table 2.
Table 5 a parameter, excitation energies (calculated and predicted) and absorption wavelengths for the models for A, B and I states of EGFP and E2 GFP State
a
Experimental
Calculation
Theoretical prediction
Eex (eV) (O.S.)
Eex (eV)
kabs (nm)
kabs (nm)
EGFP A B I
0.8897 0.7312 0.7096
3.223 (1.27) 2.544 (0.78) 2.414 (0.04)
3.131 2.573 2.497
396 482 497
400a 489a 495c
E2 GFP A
0.8743
3.076
403
410b
B I I0
0.6811 0.6824 0.6565
3.279 (1.35) 3.233e (1.25) 2.621 (1.41) 2.520 (0.63) 2.450 (0.18)
2.397 2.402 2.310
517 516 537
515b
3.219 2.573
385 481
395d 475d
WT-GFP A B
0.9147 0.7312
Oscillator strengths are given in parentheses for the CI calculations. Data from [42] for EGFP. b Data from [44] for E2 GFP. c Data from [45] for S65T. d Data from [46]. e The result of the calculation with a CAS(8,12) for E2 GFP, state A is given. a
shift of I with respect to B is quantitatively reproduced. No measurement of I absorption were reported in the literature for E2 GFP or for spectrally similar mutants.
triangles (open for m1 and filled for m2 ) in Fig. 4 as a function of the calculated absorption energies. The data appear to lie on the correlation lines, thus confirming the picture outlined in the previous sections.
4.2. Vibrational properties The mode frequencies for models A, B and I have been extracted from the velocity autocorrelation spectrum calculated on the thermalized dynamics runs at 300 K [48]. In the cases where the normal modes were not clearly solvable from the spectrum, we also performed a harmonic frequency calculation. The frequency data for the protein models are reported as
5. Conclusions In the present paper we performed an exhaustive study of the structural and vibrational properties (with DFT) and electronic properties (with semiempirical MNDO–CAS-CI method) of several different models for the active site of GFPs. We started from simple
T. Laino et al. / Chemical Physics 298 (2004) 17–28
models and extended our analysis to more realistic models for the WT-GFP and the variants EGFP and E2 GFP, systematically monitoring the effect of each functional group in close contact with the chromophore. This allowed us to achieve some noticeable results. We quantified how the chromophore excitation energy is affected by hydrogen-bond contacts with residues Arg96 and Gln94 on the imidazolidinone oxygen (they produce a red shift of 30–40 nm in the case of the neutral chromophore) and with His148, Thr203, Tyr145 and water molecules on the phenolic oxygen (they produce a blue shift of around 15 nm for the anionic chromophore and a 6 nm red shift for the neutral chromophore). We also estimated the effect of the interaction of aromatic Tyr203 with the chromophore, which affects the excitation of the anionic state (15 nm red shift) while leaving virtually unchanged that of the neutral state. Beyond explaining the effect of mutations such as T203Y, this allowed us to predict the absorption shift of the metastable I intermediate and attribute it to precise structural effects. More in general, we found a quantitative relationship between the optical properties and the structural properties of the chromophore. This was accomplished by defining a parameter that describes the degree of bondlength alternation along the chromophore and the degree of mixing between the benzenoid and quinonoid electronic resonance forms, and pointing out its linear relationship with the excitation energy. This relationship has an experimental counterpart in the observed correlation between the absorption wavelength and a particular vibrational Raman frequency. The latter corresponds to a mode that is partially localized on a bond of the conjugated system, and thus probes its strength and consequently the degree of bond-length alternation. We remark that this relationship allows us to predict the absorption energy of any GFP state based only on the optimized geometry of the chromophore in a realistic environment. Such a relationship is particularly useful since structural properties are generally predicted in a more accurate and simpler way than excitation energies. We tested this approach on the absorption energies of A and B states of mutants EGFP and E2 GFP and found that they are reproduced within 2%. This relationship can also be used to establish some structural requirements for photoinduced dark states with known absorption energy [44]. A detailed study of possible structures of these dark states is currently in progress.
Acknowledgements We thank Vittorio Pellegrini, Fabio Beltram and Paolo Giannozzi for useful discussions and the latter also for having provided the code for the generation of
27
pseudopotential. We acknowledge the allocation of computer resources from INFM Parallel computing initiative, and partial support from INFM section G (Advanced Project PA_G02_3) and section B.
References [1] Recent review articles on GFPs: R.J. Tsien, Annu. Rev. Biochem. 67 (1998) 509; Z. Zimmer, Chem. Rev. 102 (2002) 759. [2] V. Tozzini, V. Pellegrini, F. Beltram, in: W.M. Horsphool, F. Lenci (Eds.), Handbook of Organic Photochemistry and Photobiology, CRC, Washington, DC, 2003, chapter 139. [3] G.J. Palm, A. Zdanov, G.A. Gaitanaris, R. Stauber, G.N. Pavlakis, A. Wlodawer, Nat. Strut. Biol. 4 (1997) 361. [4] K. Brejc, T.K. Sixma, P.A. Kitts, R.K. Steven, R.Y. Tsien, M. Orm€ o, S.J. Remington, Proc. Natl. Acad. Sci. 94 (1997) 2306. [5] M. Chattoraj, B.A. King, G.U. Bublitz, S.G. Boxer, Proc. Natl. Acad. Sci. USA 93 (1996) 8362. [6] H. Lossau, A. Kummer, R. Heinecke, F. P€ ollinger-dammer, C. Kompa, G. Bieser, T. Jonsson, C.M. Silva, M.M. Yang, D.C. Youvan, M.E. Michel-Beyerle, Chem. Phys. 213 (1996) 1. [7] M. Cotlet, J. Hofkens, M. Michael, T. Gensch, M. Van der Auweraer, J. Michiels, G. Dirix, M. VanGuyse, J. Van der leyden, A.J.W.G. Visser, F.C. DeSchryver, J. Phys. Chem. B 105 (2001) 4999. [8] K. Winkler, J. Lindner, V. Subramaniam, T.M. Jovin, P. W€ ohringer, Phys. Chem. Chem. Phys. 4 (2002) 1072. [9] S.B. Nielsen, A. Lapierre, J.U. Andersen, U.V. Pedersen, S. Tomita, L.H. Andersen, Phys. Rev. Lett. 87 (2001) 228102. [10] L.H. Andersen, A. Lapierre, S.B. Nielsen, S.U. Pedersen, U.V. Pedersen, S. Tomita, Eur. Phys. J. D 20 (2002) 597. [11] A.D. Kummer, J. Wiehler, H. Renhaber, C. Kompa, B. Steipe, M.-E. Michel-Beyerle, J. Phys. Chem. B 104 (2000) 4791. [12] R. Heim, R. Tsien, Curr. Biol. 6 (1996) 178. [13] R.M. Wachter, M.-A. Elsliger, K. Kallio, G.T. Hanson, S.J. Remington, Structure 6 (1998) 2790. [14] A.A. Voityuk, M.-E. Michel-Beyerle, N. R€ osch, Chem. Phys. Lett. 272 (1997) 162; A.A. Voityuk, M.-E. Michel-Beyerle, N. R€ osch, Chem. Phys. 231 (1998) 13. [15] W. Weber, V. Helms, J.A. McCammon, P.W. Langhoff, Proc. Natl. Acad. Sci. USA 96 (1999) 6177. [16] C. Scharnagl, R. Raupp-Kossmann, R.S. Fisher, Biophys. J. 77 (1999) 1839. [17] M. St€ ubner, P. Schellenberg, J. Phys. Chem. A 107 (2003) 1246. [18] R. Nifosı, V. Tozzini, Proteins 51 (2003) 378. [19] A.B. Cubitt, L.A. Woolenweber, R. Heim, Methods Cell Biol. 58 (1999) 19. [20] T.M.H. Creemers, A.J. Lock, B. Subramaniam, T.M. Jovin, S. W€ olker, Chem. Phys. 275 (2002) 109. [21] D.A. Becke, Phys. Rev. A 38 (1988) 3098; C. Lee, W. Yang, R.G. Parr, Phys. Rev. B 37 (1988) 785. [22] N. Troullier, J.L. Martins, Phys. Rev. B 43 (1991) 1993. [23] The core radii used for pseudopotential were 1.23 a.u. (C and N) and 1.25 a.u. (O) and the pseudopotential for H was obtained as in L. Pavesi, P. Giannozzi, F.K. Reinhart, Phys. Rev. B 42 (1990) 1864; This set of pseudopotential with a 60 Ryd cutoff was already verified to reproduce the experimental geometries within 0.5% and the vibrational frequencies within 3% (see for instance) V. Tozzini, A.R. Bizzarri, V. Pellegrini, R. Nifosı, P. Giannozzi, A. Iuliano, S. Cannistraro, F. Beltram, Chem. Phys. 287 (2003) 33.
28
T. Laino et al. / Chemical Physics 298 (2004) 17–28
[24] R. Car, M. Parrinello, Phys. Rev. Lett. 55 (1985) 2471. [25] F. Tassone, F. Mauri, R. Car, Phys. Rev. B 50 (1994) 10561. [26] CPMD3.4: Hutter J., Alavi A., Deutsch T., Ballone P., Bernasconi M., Focher P., Fois E., Goedecker S., Marx D., Parrinello M., Tuckerman M., MPI f€ ur Festk€ orperforschung, Stuttgart and IBM Zurich Research Laboratory. [27] W. Thiel, J. Am. Chem. Soc. 103 (1981) 1413. [28] A. Schweig, W. Thiel, J. Am. Chem. Soc. 103 (1981) 1425. [29] G. Granucci, A. Toniolo, Chem. Phys. Lett. 325 (2000) 79. [30] J.J.P. Stewart, MOPAC 2000, Fujitsu Limited, Tokio, Japan, 1999. [31] In our calculation the percentage of HOMO–LUMO transition is 63–67% depending on the model, a value slightly lower than those reported in the literature (up to 80–90%). [32] See for instance R.M. Dreizler, E.K. Gross, Density Functional Theory, Springer, Berlin, 1990 (Chapters 6 and 7). [33] C.H. Choi, M. Kertesz, J. Chem. Phys. 107 (1997) 6712. [34] E.A. Perpete, B. Champagne, Theor. Chem. 487 (1999) 39. [35] Protein Data Bank. Available from:
. [36] M.A.L. Marques, X. L opez, D. Varsano, A. Castro, A. Rubio, Phys. Rev. Lett. 90 (2003) 258101. [37] V. Tozzini, R. Nifosı, J. Phys. Chem. B 105 (2001) 5797.
[38] A.F. Bell, X. He, R.M. Wachter, P.J. Tonge, Biochemistry 39 (2000) 4423. [39] X. He, A.F. Bell, P.J. Tonge, J. Phys. Chem. B 106 (2002) 6056. [40] A.P. Esposito, P. Schellenberg, W.W. Parson, P.J. Reid, J. Mol. Struct. 569 (2001) 25. [41] We heated the system for 1 ps up to 300 K, then we performed microcanonical dynamics for about 2 ps and reduced again the temperature to zero during about 1 ps. [42] A.D. Kummer, C. Kompa, H. Lossau, F. P€ ollinger-Dammer, M.E. Michel-Beyerle, C.M. Silva, E.J.B. Bylina, W.J. Coleman, M.M. Yang, D.C. Youvan, Chem. Phys. 237 (1998) 183. [43] A.A. Heikal, S.T. Hess, W.W. Webb, Chem. Phys. 274 (2001) 37. [44] R. Nifosı, A. Ferrari, C. Arcangeli, V. Tozzini, V. Pellegrini, F. Beltram, J. Phys. Chem. B 107 (2003) 1679. [45] T.M.H. Creemers, A.J. Lock, V. Subramaniam, T.M. Jovin, S. V€ olker, Proc. Natl. Acad. Sci. USA 97 (2000) 2974. [46] F. Yang, L.G. Moss, G.N. Phillips Jr., Nat. Biotechnol. 14 (1996) 1246. [47] M.A. Lill, V. Helms, Proc. Natl. Acad. Sci. USA 99 (2002) 2778. [48] J. Kohanoff, Comput. Mater. Sci. 2 (1994) 221. [49] M. Kurimoto, P. Subramony, R.W. Gurney, S. Lovell, J. Chmielewski, B. Kahr, J. Am. Chem. Soc. 121 (1999) 6952.